I want to generate floating point numbers between 0 and 1 that are not random. I would like the range to consist of 4200 values so In python I did 1/4200 to get, what number is needed to get from 0-1 in 4200 steps. This gave me the value 0.0002380952380952381, I confirmed this by doing 0.0002380952380952381*4200 = 1 (in Python) I have tried:
y_axis = [0.1964457, 0.20904465, 0.22422191, 0.68414455, 0.5341106, 0.49412863]
x1 = [0.18536805, 0.22449078, 0.26378343 ,0.73328144 ,0.63372454, 0.60280087,0.49412863]
y2_axis = [0.18536805 0.22449078 0.26378343 ... 0.73328144 0.63372454 0.60280087] 0.49412863]
plt.plot(pl.frange(0,1,0.0002380952380952381) , y_axis)
plt.plot(x1,y2)
This returns: ValueError: x and y must have same first dimension, but have shapes (4201,) and (4200,)
I would like help with resolving this, otherwise any other method that would also work would also be appreciated. I am sure other solutions are available and this maybe long winded. Thank you
To generate the numbers, you can use a list comprehension:
[i/4200 for i in range(4201)]
Numpy makes this really easy:
>>> import numpy as np
>>> np.linspace(0, 1, 4200)
array([ 0.00000000e+00, 2.38151941e-04, 4.76303882e-04, ...,
9.99523696e-01, 9.99761848e-01, 1.00000000e+00])
Related
I have some dynamically created arrays that have varying lengths and I would like to resize them to the same 5000 element length by popping every n element.
Here is what I got so far:
import numpy as np
random_array = np.random.rand(26975,3)
n_to_pop = int(len(random_array) / 5000)
print(n)
If I do the downsampling with n (5) I get 5395 elements
I can do 5395 / 5000 = 1.07899, but I don't know how to calculate how often I should pop a element to remove the last 0.07899 elements.
If I can get within 5000-5050 length that would also be acceptable, then the remainder can be sacrificed with a simple .resize
This is probably just a simple math question, but I couldn't seem to find an answer anywhere.
Any help is much appreciated.
Best regards
Martin
You can use something like np.linspace to make your solution as uniform as possible:
subset = random_array[np.round(np.linspace(0, len(random_array), 5000, endpoint=False)).astype(int)]
You don't always want to drop a uniform number of elements. Consider the case of reducing a 5003 element array to 5000 elements vs a 50003 element array. The trick is to create a set of elements to keep or drop that's as linear as possible in the index, which is exactly what np.linspace does.
You could also do something like
np.delete(random_array, np.round(np.linspace(0, len(random_array) len(random_array) - 5000, endpoint=False)).astype(int))
You can use Step solution using np.random.choice or np.random.permutation as:
random_array[np.random.permutation(random_array.shape[0])[:5000]]
In case of near uniformly remove the rows, one way is:
indices = np.linspace(0, random_array.shape[0], endpoint=False, num=5000, dtype=int)
# [ 0 5 10 16 ... 26958 26964 26969] --> shape = (5000,)
result = random_array[indices]
In my original code I have the following function:
B = np.inner(A,x)
where A.shape = [307_200] and has values -1 or 1
where x.shape = [307_200] and has values 0 to 256
where B results in a integer with a large value.
Assuming I know A and B, but don't know x, how can I solve for x??
To simplify the problem...
import numpy as np
A = np.random.choice(a=[-1,1], size=10)
x = np.random.choice(a=range(0,256), size=10)
B = np.inner(A, x)
I want to solve for x now. So something like one of the following...
x_solved = np.linalg.solve(A,x)
x_solved = np.linalg.lstsq(A,x)
Is it possible?
Extra info...
I could change A to be a n x m matrix, but since I am dealing with large matrices, when I try to use lstsq I quickly run out of memory. This is bad because 1. I can't run on my local machine and 2. the end use application needs to limit RAM.
However, for the problem above, I can except RAM intensive solutions since I might be able to moderate the compute resources with some cleaver tricks.
Also, we could switch A to boolean values if that would help.
Apologies if solution is obvious or simple.
Thanks for helps.
Here is your problem re-stated:
I have an array A containing many 1s and -1s. I want to make another array x containing integers 0-255 so that when I multiply each entry by the corresponding first array, then add up all the entries, I get some target number B.
Notice that the problem is just as difficult if you shuffle the array elements. So let's shuffle them so all the 1s are at the start and all the -1s are at the end. After solving this simplified version of the problem, we can shuffle them back.
Now the simplified problem is this:
I have A1 number of 1s and A-1 number of -1s. I want to make two arrays x1 and x-1 containing numbers from 0-255 so that when I add all the numbers in x1 and subtract all the numbers in x-1 I get some target number B.
Can you work out how to solve this?
I'd start by filling x1 with numbers 255 until the next 255 would make the sum too high, then fill the next entry with the number that makes the sum equal the target, then fill the rest with 0s. Then fill x-1 with 0s. If the target number is negative, do the opposite. Then un-shuffle it - match up the x1 and x-1 arrays with positions of the the 1s and -1s in your array A. And you're done.
You can actually write that algorithm so it puts the numbers directly in x without needing to make the temporary arrays x1 and x-1.
Dear experienced friends, I proposed a method to solve an algorithm problem. However, I found my method becomes very time-consuming when the data size grows. May I ask is there any better way to solve this problem? Is it possible to use matrix manipulation?
The question:
Suppose we have 1 score-matrix and 3 value-matrix.
Each of them is a square matrix with the same size (N*N).
The element in score-matrix means the weights between two entities. For example, S12 means the score between entity 1 and entity 2. (Weights are only meaningful when greater than 0.)
The element in value-matrix means the values between two entities. For example, V12 means the value between entity 1 and entity 2. Since we have 3 value-matrix, we have 3 different V12.
The target is: I want to multiply the values with the corresponding weights, so that I can finally output a (Nx3) matrix.
My solutions: I solved this problem as follows. However, I use two for-loops here, which makes my program become very time-consuming. (e.g. When N is big or 3 becomes 100) May I ask is there any way to improve this code? Any suggestions or hints would be very appreciated. Thank you in advance!
# generate sample data
import numpy as np
score_mat = np.random.randint(low=0, high=4, size=(2,2))
value_mat = np.random.randn(3,2,2)
# solve problem
# init the output info
output = np.zeros((2, 3))
# update the output info
for entity_1 in range(2):
# consider meaningful score
entity_others_list = np.where(score_mat[entity_1,:]>0)[0].tolist()
# iterate every other entity
for entity_2 in entity_others_list:
vec = value_mat[:,entity_1,entity_2].copy()
vec *= score_mat[entity_1,entity_2]
output[entity_1] += vec
You don't need to iterate them manually, just multiply score_mat by value_mat, then call sum on axis=2, again call sum on axis=1.
As you have mentioned that the score will make sense only if it is greater than zero, if that's the case, you can first replace non-positive values by 1, since multiplying something by 1 remains intact:
>>> score_mat[score_mat<=0] = 1
>>> (score_mat*value_mat).sum(axis=2).sum(axis=1)
array([-0.58826032, -3.08093186, 10.47858256])
Break-down:
# This is what the randomly generated numpy arrays look like:
>>> score_mat
array([[3, 3],
[1, 3]])
>>> value_mat
array([[[ 0.81935985, 0.92228075],
[ 1.07754964, -2.29691059]],
[[ 0.12355602, -0.36182607],
[ 0.49918847, -0.95510339]],
[[ 2.43514089, 1.17296263],
[-0.81233976, 0.15553725]]])
# When you multiply the matcrices, each inner matrices in value_mat will be multiplied
# element-wise by score_mat
>>> score_mat*value_mat
array([[[ 2.45807955, 2.76684225],
[ 1.07754964, -6.89073177]],
[[ 0.37066806, -1.08547821],
[ 0.49918847, -2.86531018]],
[[ 7.30542266, 3.51888789],
[-0.81233976, 0.46661176]]])
# Now calling sum on axis=2, will give the sum of each rows in the inner-most matrices
>>> (score_mat*value_mat).sum(axis=2)
array([[ 5.22492181, -5.81318213],
[-0.71481015, -2.36612171],
[10.82431055, -0.34572799]])
# Finally calling sum on axis=1, will again sum the row values
>>> (score_mat*value_mat).sum(axis=2).sum(axis=1)
array([-0.58826032, -3.08093186, 10.47858256])
I am to write a function that takes 3 arguments: Matrix 1, Matrix 2, and a number p. The functions outputs the number of entries in which the difference between Matrix 1 and Matrix 2 is bigger than p. I was instructed not to use loops.
I was advised to use X.sum() function where X is an ndarray.
I don't know what to do here.
The first thing I want to do is to subtract M2 from M1. Now I have entries, one of which is either or not bigger than p.
I tried to find a way to us the sum function, but I am afraid I can't see how it can help me.
The only thing I can think about is going through the entries, which I am not allowed to. I would appreciate you help in this. No recursion allowed as well.
import pandas as pd
# Pick value of P
p = 20
# Instantiate fake frames
a = pd.DataFrame({'foo':[4, 10], 'bar':[34, -12]})
b = pd.DataFrame({'foo':[64, 0], 'bar':[21, 354]})
# Get absolute value of difference
c = (b - a).applymap(abs)
# Boolean slice, then sum along each axis to get total number of "True"s
c.applymap(lambda x: x > p).sum().sum()
I have an array of floating-point numbers, which is unordered. I know that the values always fall around a few points, which are not known. For illustration, this list
[10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
has values clustered around 5 and 10, so I would like [5,10] as answer.
I would like to find those clusters for lists with 1000+ values, where the nunber of clusters is probably around 10 (for some given tolerance). How to do that efficiently?
Check python-cluster. With this library you could do something like this :
from cluster import *
data = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
print [mean(cluster) for cluster in cl.getlevel(1.0)]
And you would get:
[5.0062, 10.003333333333332]
(This is a very silly example, because I don't really know what you want to do, and because this is the first time I've used this library)
You can try the following method:
Sort the array first, and use diff() to calculate the difference between two continuous values. the difference larger than threshold can be consider as the split position:
import numpy as np
x = [10.01,5.001,4.89,5.1,9.9,10.1,5.05,4.99]
x = np.sort(x)
th = 0.5
print [group.mean() for group in np.split(x, np.where(np.diff(x) > th)[0]+1)]
the result is:
[5.0061999999999998, 10.003333333333332]