I'm trying to run a cycle that takes a matrix, vector-by-vector, then multiplies these vectors into a new matrix with some variation in each.
My code, however does not only alter the last-added vector, but all of them.
Here's a simplified sample:
origVector = [0,0,0,0]
nVectors=[]
for j in range(3):
print ("iteration: " + str(j))
nVectors.append(origVector)
nVectors[-1][0] += j+1
#at this point it should only change the last-added vector,
#but it changes them all
print(nVectors)
I've spent the last half day trying to figure out what's wrong with my referencing (for e.g. the matrixname[-1][0] reference is working fine if not inside the cycle)
Can someone please point me to the direction..? Thanx in advance!
Related
Given
\sum_{k=0}^{n}k^3 = \frac{n^2(n+1)^2}{4}
I need to compute the left hand side. I should make a list of the first 1000 integer cubes and sum them.
Then I should compute the right hand side and to the same.
Also, I am supposed to compare the computational time for the above methods.
What I've done so far is:
import time
start = time.clock()
list = []
for n in range(0,1001):
list.append(n**3)
print(list)
print("List of the first 1000 integer cubes is:",list, "and their sum is:", sum(list))
stop = time.clock()
print("Computation time: ",stop-start, "seconds.")
a=0
for n in range (0,1001):
a=(int)(n*(n+1)/2)
print ("Sum of the first 1000 integer cubes is:",a*a)
First part for the left hand side works fine, but the problem is the right hand side.
When I type n=4, I will get the same result for the both sides, but problem occurs when n is big, because I get that one side is bigger than the other, i.e. they are not same.
Also, can you help me create a list for the right hand side, I've tried doing something like this:
a=[]
for n in range (0,10):
a.append(int)(n**2(n+1)**2/4)
But it doesn't work.
For the computational time, I think I am supposed to set one more timer for the right hand side, but then again I am not sure how to compare them.
Any advice is welcome,
Ok, I think what you tried in a for loop is completely wrong. If you want to prove the formula "the hard way" (non-inductive way it is) - you need to simply sum cubes of n first integers and compare it against your formula - calculated one time (not summed n times).
So evaluate:
n=100
s=0
for i in range(1,n+1):
s+=i**3
###range evaluates from 1 till n if you specify it till n+1
s_ind=(n**2)*((n+1)**2)/4
print(s==s_ind)
Otherwise if you are supposed to use induction - it's all on paper i.e. base condition a_1=1^3=((1)*(2)^2)/4=1 and step condition i.e. a_n-a_(n-1)=sum(i^3)[i=1,...,n]-sum(j^3)[i=1,...,n-1]=n^3 (and that it also holds assuming a_n=your formula here (which spoiler alert - it does ;) )
Hope this helps.
I have a vector A of size 100k+ and i want to calculate the distance between every element of this vector with every other element. I am trying to solve this problem in R, using its in-built adist function and also trying to use the stringdist package.
The problem is that it is computationally very heavy and it keeps running for days without ending.
The end problem that I am trying to solve is finding duplicates or near-duplicates using a distance measure and then build some sort of a classification model around it.
The code I am using currently is
# declare an empty data frame and append data to it
matchedStr_vecA <- data.frame(row_index = integer(),
col_index = integer(),
vecA_i = character(),
vecA_j = character(),
dist_diff_vecA = double(),
stringsAsFactors=FALSE)
k = 1 # (keeps track of the pointer to the data frame)
# Run 2 different loops to calculate the bottom half of the matrix (below the diagonal -
# as the diagonal elements will be zero and the upper half is the mirror image of the bottom half)
for (i in 1:length(vecA)) {
for (j in 1:length(vecA)) {
if (i < j) {
dist_diff_vecA <- stringdist(vecA[i], vecA[j], method = "lv")
matchedStr_invId[k,] <- c(i, j, vecA[i], vecA[j], dist_diff_vecA)
k <- k + 1
}
}
}
Please help me to bring this computation from O(n^2) to O(n). I am fine with using python as well. I was told that this can be solved using dynamic programming programming but I am not sure how to implement it.
Thanks all
I had the very same problem of calculating the distance matrix and I have successfully solved it in Python. The crucial elements of the solution to ensure you are equally splitting the calculations between threads is discussed in this question:
How to split diagonal matrix into equal number of items each along one of axis?
There are two things to point out:
The distance between two points is typically symmetrical so you can reuse this mathematical feature and calculate distance between i and j elements once and either store it or reuse it for the distance between j and i.
The algorithm cannot be optimized below O(n^2) unless you are OK with imprecise results. And since you are new to programming I would not even consider going that way.
You should be able to parallelize the calculations using index splitting as I suggested in the question above for a near-optimal solution.
k=3000
UnqLab = unique(TrainingLabels)
n = length(UnqLab)
count=hist(TrainingLabels,UnqLab);
num = 1;
for i = 1:n
fprintf('\n %i',i)
nn = count(i)
for j = 1:nn
NTrainingFeatures(num,:) = TrainingFeatures(num,ranking(i,1:k))
num = num +1;
end
end
Here TrainingLabels is of size 21,000 * 1 and contain 257 labels in sorted order. For example 001,001,001,001,001,001.....002,002,002.......257,257.
TrainingFeatures is of size 21,000 * 4096 containg some values.
ranking is of size 257*4096. ranking contains the rank for example (3076,456,765,4000,87,5,.....). This is how first row looks like. This means that entry of all the first label at 3076 in TrainingFeatures has given first rank.
This code takes too much computation tme(in days). Can there be any way so that it takes less time. Code in Matlab or Python would work.
In general in MATLAB you want to avoid loops. In particular, it looks like your biggest issue is your inner loop; instead of stepping through count(i) of items, it would be faster to copy the whole block at once. This should be possible because it looks like you are taking the same sized chunk of TrainingFeatures each time (neither i nor k is loop-dependent). So you should be able to do something like
NTrainingFeature(num:num+count(i),:) = TrainingFeatures(num:num+count(i),ranking(i,1:k));
num = num + count(i) + 1;
I would definitely test this out (perhaps on a subset of your data or smaller matrices) to make sure everything lines up properly. Without access to your code to test, I may have made a mistake in setting up the indices, or perhaps the matrix shapes don't match. If you are having trouble getting your matrix indices and shapes to line up, you can try using reshape() or using single-index calling.
I'm trying avoid to use for loops to run my calculations. But I don't know how to do it. I have a matrix w with shape (40,100). Each line holds the position to a wave in a t time. For example first line w[0] is the initial condition (also w[1] for reasons that I will show).
To calculate the next line elements I use, for every t and x on shape range:
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
Where a and b are some constants based on equation solution (it really doesn't matter), a = 2(1-r), b=r, r=(c*(dt/dx))**2. Where c is the wave speed and dt, dx are related to the increment on x and t direction.
Is there any way to avoid a for loop like:
for t in range(1,nt-1):
for x in range(1,nx-1):
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
nt and nx are the shape of w matrix.
I assume you're setting w[:,0] and w[:-1] beforehand (to some constants?) because I don't see it in the loop.
If so, you can eliminate for x loop vectorizing this part of code:
for t in range(1,nt-1):
w[t+1,1:-1] = a*w[t,1:-1] + b*(w[t,:-2] + w[t,2:]) - w[t-1,1:-1]
Not really. If you want to do something for every element in your matrix (which you do), you're going to have to operate on each element in some way or another (most obvious way is with a for loop. Less obvious methods will either perform the same or worse).
If you're trying to avoid loops because loops are slow, know that sometimes loops are necessary to solve a certain kind of problem. However, there are lots of ways to make loops more efficient.
Generally with matrix problems like this where you're looking at the neighboring elements, a good solution is using some kind of dynamic programming or memoization (saving your work so you don't have to repeat calculations frequently). Like, suppose for each element you wanted to take the average of it and all the things around it (this is how blurring images works). Each pixel has 8 neighbors, so the average will be the sum / 9. Well, let's say you save the sums of the columns (save NW + W + SW, N + me + S, NE + E + SE). Well when you go to the next one to the right, just sum the values of your previous middle column, your previous last column, and the values of a new column (the new ones to the right). You just replaced adding 9 numbers with adding 5. In operations that are more complicated than addition, reducing 9 to 5 can mean a huge performance increase.
I looked at what you have to do and I couldn't think of a good way to do something like I just described. But see if you can think of something similar.
Also, remember multiplication is much more expensive than addition. So if you had a loop where, for instance, you had to multiply some number by the loop variable, instead of doing 1x, 2x, 3x, ..., you could do (value last time + x).
this may be just as much a maths problem than a code problem. I decided to learn how 3d engines work, and i'm following http://petercollingridge.appspot.com/3D-tutorial/rotating-objects this guide, but converting the code to python. in the function for rotating on the Z-axis, my code looks like this:
def rotate_z(theta):
theta=math.radians(theta)
for i in ords:
i[0]= i[0]*math.cos(theta) - i[1]* math.sin(theta)
i[1]= i[1]*math.cos(theta) + i[0]* math.sin(theta)
which rotates the node the appropriate amount, but over maybe 5 seconds, or 150 frames, the nodes start to slowly move together, until, about 20 seconds in, they coalesce. my initial thought was that it was a round down on the last two lines, but i am stuck. any ideas anyone?
It looks like the problem is that you're changing the value of i[0] when you need the old value to set i[1]:
i[0]= i[0]*math.cos(theta) - i[1]*math.sin(theta) <-- You change the value of i[0]
i[1]= i[1]*math.cos(theta) + i[0]*math.sin(theta) <-- You use the changed value of i[0], not the original
So the value of i[0] gets replaced, when you still want to keep it.
You can solve this by using separate variables (as Peter Collingridge does):
for i in ords:
x = i[0]
y = i[1]
i[0]= x*math.cos(theta) - y*math.sin(theta)
i[1]= y*math.cos(theta) + x*math.sin(theta)
This way, you should not get the "feedback loop" which results in the points gradually floating together.