In numpy, the original array has the shape(2,2,2) like this
[[[0.2,0.3],[0.1,0.5]],[[0.1,0.3],[0.1,0.4]]]
I'd like to scale the array so that the max value of the a dimension is 1 like this:
As max([0.2,0.1,0.1,0.1]) is 0.2, and 1/0.2 is 5, so for the first element of the int tuple, multiple it by 5.
As max([0.3,0.5,0.3,0.4]) is 0.5, and 1/0.5 is 2, so for the second element of the int tuple, multiple it by 2
So the final array is like this:
[[[1,0.6],[0.5,1]],[[0.5,0.6],[0.5,0.8]]]
I know how to multiple an array with an integer in numpy, but I'm not sure how to multiple the array with different factor. Does anyone have ideas about this?
If your array = a:
>>> import numpy as np
>>> a = np.array([[[0.2,0.3],[0.1,0.5]],[[0.1,0.3],[0.1,0.4]]])
You can do this:
>>> a/np.amax(a.reshape(4,2),axis=0)
array([[[ 1. , 0.6],
[ 0.5, 1. ]],
[[ 0.5, 0.6],
[ 0.5, 0.8]]])
Related
I have a np.array of 50 elements. For example:
data = np.array([9.22, 9. , 9.01, ..., 7.98, 6.77, 7.3 ])
For each element of the data np.array, I have a x and y data pair (both with the same length) that I want to interpolate with. For example:
x = np.array([[ 1, 2, 3, 4, 5 ],
...,
[ 1.01, 2.01, 3.02, 4.03, 5.07 ]])
y = np.array([[0. , 1. , 0.95, ..., 0.07, 0.06, 0.06],
...,
[0. , 0.99 , 0.85, ..., 0.03, 0.05, 0.06]])
I want to interpolate each data element with the respective np.array of x and y.
I have the following solution using map():
def cubic_spline(i):
return scipy.interpolate.splev(x=data[i],
tck=scipy.interpolate.splrep(x[i], y[i], k=3))
list(map(cubic_spline, np.arange(len(data)))
But I'm wondering if there is a way to do it directly with scipy and numpy to optimize the execution time. Something like:
scipy.interpolate.splev(x=data,
tck=scipy.interpolate.splrep(x, y, k=3))
Any suggestions will be appreciated. Thanks in advance.
If you have a single x array and multiple y arrays, newer interpolators (make_interp_spline, PchipInterpolator etc) support multidimensional y arrays automatically.
If you really have a collection of pairs of 1D arrays, x and y, where x arrays differ, and you want scipy to loop over these datasets, then no, scipy does not support that. You'd need to loop over them manually.
I just cannot find a solution to the following problem:
Consider two NumPy.arrays, one of shape (10,64,10) and one of (x, 64).
Array A (10,64,10) represents 10 classes with 64 features and over each of these features I got a PDF split in 10 bins --> (Classes, Features, Bins). Each value in that innermost array represents a probability.
[[[0.62, 0., 0. ],
[0.12, 0.09, 0.01],
[0.59, 0.01, 0. ],
[0.62, 0., 0. ]],
[[0.62, 0., 0. ],
[0.59, 0.01, 0. ],
[0.62, 0., 0. ],
[0.62, 0., 0. ]]]
(simplified to (2,4,3) so you can test it by copying it directly. The representating classes are "0" and "1")
Array B (X, 64) are the instance of a dataset X and the bin-index the i`th feature belongs to\
[[0, 0, 2, 1]
[0, 0, 1, 0]
[0, 2, 1, 0]]
(simplified to (X=3, 4))
What I want to do is for each row in Array B, e.g. [0, 0, 2, 1] I want to get the probability that when the bin for the first feature is 0 the class is "1" and that the class is "0".
The expected output for the first instance here would be:
"0" = [0.62, 0.12, 0.00, 0.00]
"1" = [0.62, 0.59, 0.00, 0.00]
and if possible then for all X instances.
I do not expect any kind of Dictionary or anything alike but just some array that contains the values in a somewhat sorted manner (can also be another sort than shown in the example)
Of course, I could do all this in giant nested for-loops, but I want at least some vectorization. As anybody any good suggestions, our answer does not have to be a full-fledged solution.
EDIT:
The best nested loop I came up with was
prediction = np.empty((bins.shape[0], histograms.shape[1], histograms.shape[0]))
for n, instance in enumerate(bins):
for i, instance_bin in enumerate(instance):
prediction[n,i] = histograms[:,i, instance_bin] # Prob for every class x that the bin given in "instance_bin" of feature "i" corresponds to a possible instance of that class
histograms = Array A;
bins = Array B
Please also tell me of every other bad-practise if you find any in my way of working with numpy or anything else in this snipped.
I have to round every element inside a numpy array only to .5 or .0 values. I know the np.arange() method, however it is not useful in this specific task since I can only use it to set a precision equal to one.
Here there is an example of what I should do:
x = np.array([2.99845, 4.51845, 0.33365, 0.22501, 2.48523])
x_rounded = some_function(x)
>>> x_rounded
array([3.0, 4.5, 0.5, 0.0, 2.5])
Is there a built-in method to do so or I have to create it?
If I should create that method, is there an efficient? I'm working on a big dataset, so I would like to avoid iterating over each element.
import numpy as np
x = np.array([2.99845, 4.51845, 0.33365, 0.22501, 2.48523])
np.round(2 * x) / 2
Output:
array([3. , 4.5, 0.5, 0. , 2.5])
I am trying to take the reciprocal of every non zero value in a numpy array but am messing something up. Suppose:
norm = np.arange(0,11)
I would like the np.array that would be (maintaining the zeros in place)
[ 0, 1, 0.5 , 0.33, 0.25, 0.2 , 0.17, 0.14, 0.12, 0.11, 0.1]
If I set
mask = norm !=0
and I try
1/norm[mask]
I receive the expected result of
[1, 0.5 , 0.33, 0.25, 0.2 , 0.17, 0.14, 0.12, 0.11, 0.1]
However I'm trying to understand why is it that when I try the following assignment
norm[mask] = 1/norm[mask]
i get the following numpy array.
[0,1,0,0,0,0,0,0,0,0,0]
any ideas on why this is or how to achieve the desired np.array?
Are you sure you didn't accidentally change the value of norm.
Both
mask = norm != 0
norm[mask] = 1 / norm[mask]
and
norm[norm != 0] = 1 / norm[norm != 0]
both do exactly what you want them to do. I also tried it using mask on the left side and norm != 0 on the right side like you do above (why?) and it works fine.
EDIT BY FY: I misread the example. I thought original poster was starting with [0, .5, .333, .25] rather than with [0, 1, 2, 3, 4]. Poster is accidentally creating an int64 array rather than a floating point array, and everything is rounding down to zero. Change it to np.arange(0., 11.)
another option is using numpy.reciprocal as documented here with a parameter where as followed:
import numpy as np
data = np.reciprocal(data,where= data!=0)
example:
In[1]: data = np.array([2.0,4.0,0.0])
in[2]: np.reciprocal(data,where=data!=0)
Out[9]: array([0.5 , 0.25, 0. ])
notice that this function is not intended to work with ints, therefore the initialized values are with the .0 suffix.
if you're not sure of the type, you can always use data.astype(float64)
I am seeing something strange while using AffinityPropagation from sklearn. I have a 4 x 4 numpy ndarray - which is basically the affinity-scores. sim[i, j] has the affinity score of [i, j]. Now, when I feed into the AffinityPropgation function, I get a total of 4 labels.
here is an similar example with a smaller matrix:
In [215]: x = np.array([[1, 0.2, 0.4, 0], [0.2, 1, 0.8, 0.3], [0.4, 0.8, 1, 0.7], [0, 0.3, 0.7, 1]]
.....: )
In [216]: x
Out[216]:
array([[ 1. , 0.2, 0.4, 0. ],
[ 0.2, 1. , 0.8, 0.3],
[ 0.4, 0.8, 1. , 0.7],
[ 0. , 0.3, 0.7, 1. ]])
In [217]: clusterer = cluster.AffinityPropagation(affinity='precomputed')
In [218]: f = clusterer.fit(x)
In [219]: f.labels_
Out[219]: array([0, 1, 1, 1])
This says (according to Kevin), that the first sample (0th-indexed row) is a cluster (Cluster # 0) on its own and the rest of the samples are in another cluster (cluster # 1). But, still, I do not understand this output. What is a sample here? What are the members? I want to have a set of pairs (i, j) assigned to one cluster, another set of pairs assigned to another cluster and so on.
It looks like a 4-sample x 4-feature matrix..which I do not want. Is this the problem? If so, how to convert this to a nice 4-sample x 4-sample affinity-matrix?
The documentation (http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html) says
fit(X, y=None)
Create affinity matrix from negative euclidean distances, then apply affinity propagation clustering.
Parameters:
X: array-like, shape (n_samples, n_features) or (n_samples, n_samples) :
Data matrix or, if affinity is precomputed, matrix of similarities / affinities.
Thanks!
By your description it sounds like you are working with a "pairwise similarity matrix": x (although your example data does not show that). If this is the case your matrix should be symmertric so that: sim[i,j] == sim[j,i] with your diagonal values equal to 1. Example similarity data S:
S
array([[ 1. , 0.08276253, 0.16227766, 0.47213595, 0.64575131],
[ 0.08276253, 1. , 0.56776436, 0.74456265, 0.09901951],
[ 0.16227766, 0.56776436, 1. , 0.47722558, 0.58257569],
[ 0.47213595, 0.74456265, 0.47722558, 1. , 0.87298335],
[ 0.64575131, 0.09901951, 0.58257569, 0.87298335, 1. ]])
Typically when you already have a distance matrix you should use affinity='precomputed'. But in your case, you are using similarity. In this specific example you can transform to pseudo-distance using 1-D. (The reason to do this would be because I don't know that Affinity Propagation will give you expected results if you give it a similarity matrix as input):
1-D
array([[ 0. , 0.91723747, 0.83772234, 0.52786405, 0.35424869],
[ 0.91723747, 0. , 0.43223564, 0.25543735, 0.90098049],
[ 0.83772234, 0.43223564, 0. , 0.52277442, 0.41742431],
[ 0.52786405, 0.25543735, 0.52277442, 0. , 0.12701665],
[ 0.35424869, 0.90098049, 0.41742431, 0.12701665, 0. ]])
With that being said, I think this is where your interpretation was off:
This says that the first 3-rows are similar, 4th row is a cluster on its own, and the 5th row is also a cluster on its own. Totally of 3 clusters.
The f.labels_ array:
array([0, 1, 1, 1, 0])
is telling you that samples (not rows) 0 and 4 are in cluster 0 AND that samples 2, 3, and 4 are in cluster 1. You don't need 25 different labels for a 5 sample problem, that wouldn't make sense. Hope this helps a little, try the demo (inspect the variables along the way and compare them with your data), which starts with raw data; it should help you decide if Affinity Propagation is the right clustering algorithm for you.
According to this page https://scikit-learn.org/stable/modules/clustering.html
you can use a similarity matrix for AffinityPropagation.