Creating multidimensional arrays similair to matlab - python

I have this MatLab-Code and I need to do the same with Python.
for t = 1:AnzahlSegA zsegaussen{t,1} = werteAussen(t,SpalteSegStartA):Abstandrechnung:werteAussen(t,SpalteSegEndeA);
This code creates an array with t*1 dimensions and fills each row with numbers. it starts at a given number (the first cell) and adds more numbers according to the stepsize ( in this case Abstandsrechnung) till it reaches the end value. WerteAussen is a Matrix which contains the needed information.
I wanted to the same with python but couldnt come up with something good. my best effort is this
zsegmente = [[] for i in range(row)]
while listenstart < row and p < row:
x: int
for x in range(int(AllesAussen.iloc[p, 0]), int(AllesAussen.iloc[p, 1]), Schrittweite):
zsegmente[listenstart].append(x)
if len(zsegmente[listenstart]) == (int(AllesAussen.iloc[p, 1] - int(AllesAussen.iloc[p, 0])) / Schrittweite):
listenstart += 1
p += 1
print(zsegmente)
my idea was to have a list which contains lists which have the same information as each row in the array. as it turned out this is not easy to use anmd modify so I really need to get a multy dimensional array somehow.
I appreciate any help.

Related

How to only iterate over one argument of an matrix array if both have the same variable in python?

I am trying to eliminate some non zero entries in a matrix where the 2 adjacent diagonals to the main diagonal are nonzero.
h = np.zeros((n**2,n**2))
for i in np.arange(0, n**2):
for j in np.arange(0,n**2):
if(i==j):
for i in np.arange(0,n**2,n):
h[i,j-1] = 0
print(h)
I want it to only eliminate the lower triangle non-zero entries, but it's erasing some entries in the upper triangle. I know this is because on the last if statement with the for loop, it is iterating for both arguments of the array, when I only want it to iterate for the first argument i, but since I set i=j, it runs for both.
The matrix I want to obtain is the following:
Desired matrix
PS: sorry for the extremely bad question format, this is my first question.
hamiltonian = np.zeros((n**2,n**2)) # store the Hamiltonian
for i in np.arange(0, n**2):
for j in np.arange(0,n**2):
if abs(i-j) == 1:
hamiltonian[i,j] = 1
Is this what you are looking for?:
hamiltonian[0,1] = 1
hamiltonian[n**2-1,n**2-2] = 1
for i in np.arange(1, n**2-1):
hamiltonian[i,i+1] = 1
hamiltonian[i,i-1] = 1

K Means in Python from Scratch

I have a python code for a k-means algorithm.
I am having a hard time understanding what it does.
Lines like C = X[numpy.random.choice(X.shape[0], k, replace=False), :] are very confusing to me.
Could someone explain what this code is actually doing?
Thank you
def k_means(data, k, num_of_features):
# Make a matrix out of the data
X = data.as_matrix()
# Get k random points from the data
C = X[numpy.random.choice(X.shape[0], k, replace=False), :]
# Remove the last col
C = [C[j][:-1] for j in range(len(C))]
# Turn it into a numpy array
C = numpy.asarray(C)
# To store the value of centroids when it updates
C_old = numpy.zeros(C.shape)
# Make an array that will assign clusters to each point
clusters = numpy.zeros(len(X))
# Error func. - Distance between new centroids and old centroids
error = dist(C, C_old, None)
# Loop will run till the error becomes zero of 5 tries
tries = 0
while error != 0 and tries < 1:
# Assigning each value to its closest cluster
for i in range(len(X)):
# Get closest cluster in terms of distance
clusters[i] = dist1(X[i][:-1], C)
# Storing the old centroid values
C_old = deepcopy(C)
# Finding the new centroids by taking the average value
for i in range(k):
# Get all of the points that match the cluster you are on
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]
# If there were no points assigned to cluster, put at origin
if not points:
C[i][:] = numpy.zeros(C[i].shape)
else:
# Get the average of all the points and put that centroid there
C[i] = numpy.mean(points, axis=0)
# Erro is the distance between where the centroids use to be and where they are now
error = dist(C, C_old, None)
# Increase tries
tries += 1
return sil_coefficient(X,clusters,k)
(Expanded answer, will format later)
X is the data, as a matrix.
Using the [] notation, we are taking slices, or selecting single element, from the matrix. You may want to review numpy array indexing. https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
numpy.random.choice selects k elements at random from the size of the first dimension of the data matrix without replacement.
Notice, that in indexing, using the [] syntax, we see we have two entries. The numpy.random.choice, and ":".
":" indicates that we are taking everything along that axis.
Thus, X[numpy.random.choice(X.shape[0], k, replace=False), :] means we select an element along the first axis and take every element along the second which shares that first index. Effectively, we are selecting a random row of a matrix.
(The comments expalain this code quite well, I would suggest you read into numpy indexing an list comprehensions for further elucidation).
C[C[j][:-1] for j in range(len(c))]
The part after "C[" uses a list comprehension in order to select parts of the matrix C.
C[j] represents the rows of the matrix C.
We use the [:-1] to take up to, but not including the final element of the row. We do this for each row in the matrix C. This removes the last column of the matrix.
C = numpy.asarray(C). This converts the matrix to a numpy array so we can do special numpy things with it.
C_old = numpy.zeros(C.shape). This creates a zero matrix, to later be populated, which is the same size as C. We are initializing this array to be populated later.
clusters = numpy.zeros(len(x)). This creates a zero vector whose dimension is the same as the number of rows in the matrix X. This vector will be populated later. We are initializing this array to be populated later.
error = dist(C, C_old, None). Take the distance between the two matrices. I believe this function to be defined elsewhere in your script.
tries = 0. Set the tires counter to 0.
while...do this block while this condition is true.
for i in [0...(number of rows in X - 1)]:
clusters[i] = dist1(X[i][:-1], C); Put which cluster the ith row of X is closest to in the ith position of clusters.
C_old = deepcopy(C) - Create a copy of C which is new. Don't just move pointers.
for each (0..number of means - 1):
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]. This is a list comprehension. Create a list of the rows of X, with all but the last entry, but only include the row if it belongs to the jth cluster.
if not points. If nothing belongs to a cluster.
C[i][:] = numpy.zeros(C[i].shape). Create a vector of zeros, to be populated later, and use this vector as the ith row of the clusters matrix, C.
else:
C[i] = np.mean(points, axis=0). Assign the ith row of the clusters matrix, C, to be the average point in the cluster. We sum across the rows (axis=0). This is us updating our clusters.

How to generate a matrix with random entries and with constraints on row and columns?

How to generate a matrix that its entries are random real numbers between zero and one inclusive with the additional constraint : The sum of each row must be less than or equal to one and the sum of each column must be less than or equal to one.
Examples:
matrix = [0.3, 0.4, 0.2;
0.7, 0.0, 0.3;
0.0, 0.5, 0.1]
If you want a matrix that is uniformly distributed and fulfills those constraints, you probably need a rejection method. In Matlab it would be:
n = 3;
done = false;
while ~done
matrix = rand(n);
done = all(sum(matrix,1)<=1) & all(sum(matrix,2)<=1);
end
Note that this will be slow for large n.
If you're looking for a Python way, this is simply a transcription of Luis Mendo's rejection method. For simplicity, I'll be using NumPy:
import numpy as np
n = 3
done = False
while not done:
matrix = np.random.rand(n,n)
done = np.all(np.logical_and(matrix.sum(axis=0) <= 1, matrix.sum(axis=1) <= 1))
If you don't have NumPy, then you can generate your 2D matrix as a list of lists instead:
import random
n = 3
done = False
while not done:
# Create matrix as a list of lists
matrix = [[random.random() for _ in range(n)] for _ in range(n)]
# Compute the row sums and check for each to be <= 1
row_sums = [sum(matrix[i]) <= 1 for i in range(n)]
# Compute the column sums and check for each to be <= 1
col_sums = [sum([matrix[j][i] for j in range(n)]) <= 1 for i in range(n)]
# Only quit of all row and column sums are less than 1
done = all(row_sums) and all(col_sums)
The rejection method will surely give you a uniform solution, but it might take a long time to generate a good matrix, especially if your matrix is large. So another, but more tedious approach is to generate each element such that the sum can only be 1 in each direction. For this you always generate a new element between 0 and the remainder until 1:
n = 3
matrix = zeros(n+1); %dummy line in first row/column
for k1=2:n+1
for k2=2:n+1
matrix(k1,k2)=rand()*(1-max(sum(matrix(k1,1:k2-1)),sum(matrix(1:k1-1,k2))));
end
end
matrix = matrix(2:end,2:end)
It's a bit tricky because for each element you check the row-sum and column-sum until that point, and use the larger of the two for generating a new element (in order to stay below a sum of 1 in both directions). For practical reasons I padded the matrix with a zero line and column at the beginning to avoid indexing problems with k1-1 and k2-1.
Note that as #LuisMendo pointed out, this will have a different distribution as the rejection method. But if your constraints do not consider the distribution, this could do as well (and this will give you a matrix from a single run).

Walk through each column in a numpy matrix efficiently in Python

I have a very big two-dimensions array in Python, using numpy library. I want to walk through each column efficiently and check each time if elements are different from 0 to count their number in every column.
Suppose I have the following matrix.
M = array([[1,2], [3,4]])
The following code enables us to walk through each row efficiently, for example (it is not what I intend to do of course!):
for row_idx, row in enumerate(M):
print "row_idx", row_idx, "row", row
for col_idx, element in enumerate(row):
print "col_idx", col_idx, "element", element
# update the matrix M: square each element
M[row_idx, col_idx] = element ** 2
However, in my case I want to walk through each column efficiently, since I have a very big matrix.
I've heard that there is a very efficient way to achieve this using numpy, instead of my current code:
curr_col, curr_row = 0, 0
while (curr_col < numb_colonnes):
result = 0
while (curr_row < numb_rows):
# If different from 0
if (M[curr_row][curr_col] != 0):
result += 1
curr_row += 1
.... using result value ...
curr_col += 1
curr_row = 0
Thanks in advance!
In the code you showed us, you treat numpy's arrays as lists and for what you can see, it works! But arrays are not lists, and while you can treat them as such it wouldn't make sense to use arrays, or even numpy.
To really exploit the usefulness of numpy you have to operate directly on arrays, writing, e.g.,
M = M*M
when you want to square the elements of an array and using the rich set of numpy functions to operate directly on arrays.
That said, I'll try to get a bit closer to your problem...
If your intent is to count the elements of an array that are different from zero, you can use the numpy function sum.
Using sum, you can obtain the sum of all the elements in an array, or you can sum across a particular axis.
import numpy as np
a = np.array(((3,4),(5,6)))
print np.sum(a) # 18
print np.sum(a, axis=0) # [8, 10]
print np.sum(a, axis=1) # [7, 11]
Now you are protesting: I don't want to sum the elements, I want to count the non-zero elements... but
if you write a logical test on an array, you obtain an array of booleans, e.g, we want to test which elements of a are even
print a%2==0
# [[False True]
# [False True]]
False is zero and True is one, at least when we sum it...
print np.sum(a%2==0) # 2
or, if you want to sum over a column, i.e., the index that changes is the 0-th
print np.sum(a%2==0, axis=0) # [0 2]
or sum across a row
print np.sum(a%2==0, axis=1) # [1 1]
To summarize, for your particular use case
by_col = np.sum(M!=0, axis=0)
# use the counts of non-zero terms in each column, stored in an array
...
# if you need the grand total, use sum again
total = np.sum(by_col)

Array element evaluation from reverse

I'm still very new to python and programing and I'm trying to figure out if I'm going about this problem in the correct fashion. I tend to have a matlab approach to things but here I'm just struggling...
Context:
I have two numpy arrays plotted in this image on flickr since I can't post photos here :(. They are of equal length properties (both 777x1600) and I'm trying to use the red array to help return the index(value on the x-axis of plot) and element value(y-axis) of the point in the blue plot indicated by the arrow for each row of the blue array.
The procedure I've been tasked with was to:
a) determine max value of red array (represented with red dot in figure and already achieved)
and b) Start at the end of the blue array with the final element and count backwards, comparing element to preceding element. Goal being to determine where the preceding value decreases. (for example, when element -1 is greater than element -2, indicative of the last peak in the image). Additionally, to prevent selecting "noise" at the tail end of the section with elevated values, I also need to constrain the selected value to be larger than the maximum of the red array.
Here's what I've got so far, but I'm stuck on line two where I have to evaluate the selected row of the array from the (-1) position in the row to the beginning, or (0) position:
for i,n in enumerate(blue): #select each row of blue in turn to analyze
for j,m in enumerate(n): #select each element of blue ??how do I start from the end of array and work backwards??
if m > m-1 and m > max_val_red[i]:
indx_m[i] = j
val_m[i] = m
To answer you question directly, you can use n[::-1] to reverse the arrray n.
So the code is :
for j, m in enumerate(n[::-1]):
j = len(n)-j-1
# here is your code
But to increase calculation speed, you should avoid python loop:
import numpy as np
n = np.array([1,2,3,4,2,5,7,8,3,2,3,3,0,1,1,2])
idx = np.nonzero(np.diff(n) < 0)[0]
peaks = n[idx]
mask = peaks > 3 # peak muse larger than 3
print "index=", idx[mask]
print "value=", peaks[mask]
the output is:
index= [3 7]
value= [4 8]
I assume you mean:
if m > n[j-1] and m > max_val_red[i]:
indx_m[i] = j
val_m[i] = m
because m > m - 1 is always True
To reverse an array on an axis you can index the array using ::-1 on that axis, for example to reverse blue on axis 1 you can use:
blue_reverse = blue[:, ::-1]
Try and see you can write your function as a set of array operations instead of loops (that tends to be much faster). This is similar to the other answer, but it should allow you avoid both loops you're currently using:
threshold = red.max(1)
threshold = threshold[:, np.newaxis] #this makes threshold's shape (n, 1)
blue = blue[:, ::-1]
index_from_end = np.argmax((blue[:, :-1] > blue[:, 1:]) & (blue[:, :-1] > threshold), 1)
value = blue[range(len(blue)), index_from_end]
index = blue.shape[1] - 1 - index_from_end
Sorry, I didn't read all of it but you can possibly look into the built in function reversed.
so instead of enumerate( n ). you can do reversed( enumerate( n ) ). But then your index would be wrong the correct index would be eval to len( n ) - j

Categories