Related
I am trying to write out a covariance calculation for the following example, and i know there has to be a better way than a for loop. I've looked into np.dot, np.einsum and i feel like np.einsum has the capability but i am just missing something for implementing it.
import numpy as np
# this is mx3
a = np.array([[1,2,3],[4,5,6]])
# this is x3
mean = a.mean(axis=0)
# result should be 3x3
b = np.zeros((3,3))
for i in range(a.shape[0]):
b = b + (a[i]-mean).reshape(3,1) * (a[i]-mean)
b
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
so this is fine for a 2 data point sample but for a m=large number this is super slow. There has to be a better way. Any suggestions?
In [108]: a = np.array([[1,2,3],[4,5,6]])
...: # this is x3
...: mean = a.mean(axis=0)
...:
...: # result should be 3x3
...: b = np.zeros((3,3))
...: for i in range(a.shape[0]):
...: b = b + (a[i]-mean).reshape(3,1) * (a[i]-mean)
...:
In [109]: b
Out[109]:
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
In [110]: a.mean(axis=0)
Out[110]: array([2.5, 3.5, 4.5])
Since the mean is subtracted twice, lets define a new variable. In this case the 2d and 1d dimensions broadcast, so we can simply:
In [111]: a1= a - a.mean(axis=0)
In [112]: a1
Out[112]:
array([[-1.5, -1.5, -1.5],
[ 1.5, 1.5, 1.5]])
The rest is a normal dot product:
In [113]: a1.T#a1
Out[113]:
array([[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5],
[4.5, 4.5, 4.5]])
np.einsum and np.dot can also do this matrix multiplication.
I have a numpy array([1.0, 2.0, 3.0]), which is actually a mesh in 1 dimension in my problem. What I want to do is to refine the mesh to get this: array([0.8, 0.9, 1, 1.1, 1.2, 1.8, 1.9, 2, 2.1, 2.2, 2.8, 2.9, 3, 3.1, 3.2,]).
The actual array is very large and this procedure costs a lot of time. How to do this quickly (maybe vectorize) in python?
Here's a vectorized approach -
(a[:,None] + np.arange(-0.2,0.3,0.1)).ravel() # a is input array
Sample run -
In [15]: a = np.array([1.0, 2.0, 3.0]) # Input array
In [16]: (a[:,None] + np.arange(-0.2,0.3,0.1)).ravel()
Out[16]:
array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
2.9, 3. , 3.1, 3.2])
Here are a few options(python 3):
Option 1:
np.array([j for i in arr for j in np.arange(i - 0.2, i + 0.25, 0.1)])
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Option 2:
np.array([j for x, y in zip(arr - 0.2, arr + 0.25) for j in np.arange(x,y,0.1)])
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Option 3:
np.array([arr + i for i in np.arange(-0.2, 0.25, 0.1)]).T.ravel()
# array([ 0.8, 0.9, 1. , 1.1, 1.2, 1.8, 1.9, 2. , 2.1, 2.2, 2.8,
# 2.9, 3. , 3.1, 3.2])
Timing on a larger array:
arr = np.arange(100000)
arr
# array([ 0, 1, 2, ..., 99997, 99998, 99999])
%timeit np.array([j for i in arr for j in np.arange(i-0.2, i+0.25, 0.1)])
# 1 loop, best of 3: 615 ms per loop
%timeit np.array([j for x, y in zip(arr - 0.2, arr + 0.25) for j in np.arange(x,y,0.1)])
# 1 loop, best of 3: 250 ms per loop
%timeit np.array([arr + i for i in np.arange(-0.2, 0.25, 0.1)]).T.ravel()
# 100 loops, best of 3: 1.93 ms per loop
I know that this question might seem repeated, but I tried to debug my code in several ways and still don't know what it is wrong. Below is my code.
def myfunc(LUT,LUT_Prob,test):
x = []
y = []
z = []
x.extend(hamming_distance(test, LUT[i]) for i in range (len(LUT)))
y = [(len(LUT[0])) - j for j in x]
z = [a*b for a,b in zip(y,LUT_prob)]
MAP = max(z)
closest_index = z.index(max(z))
return x, y, LUT_Prob, z, MAP, closest_index
In another script:
Winner = []
for j in range (0,5):
Winner.append(myfunc(LUT1,LUT_Prob1,test[j]))
print 'Winner = {}' .format(Winner)
The output is:
Winner = [([2, 4, 2, 4], [8, 6, 8, 6], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 3.2, 1.6, 1.6, 1.6])]], [array([ 3.2, 1.6, 1.6, 1.6])], 0), ([1, 3, 1, 3], [9, 7, 9, 7], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 3.6, 1.8, 1.8, 1.8])]], [array([ 3.6, 1.8, 1.8, 1.8])], 0), ([3, 5, 5, 3], [7, 5, 5, 7], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 2.8, 1.4, 1.4, 1.4])]], [array([ 2.8, 1.4, 1.4, 1.4])], 0), ([3, 5, 3, 5], [7, 5, 7, 5], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 2.8, 1.4, 1.4, 1.4])]], [array([ 2.8, 1.4, 1.4, 1.4])], 0), ([3, 3, 3, 1], [7, 7, 7, 9], [array([ 0.4, 0.2, 0.2, 0.2])], [[array([ 2.8, 1.4, 1.4, 1.4])]], [array([ 2.8, 1.4, 1.4, 1.4])], 0)]
Note: The output is the returned values x, y, LUT_Prob, z, MAP, closest_index with the same order and iterated 5 times.
The errors that I am getting:
1- z is not as expected, the expectation is multiply y and LUT_Prob element wise what I am getting is the results of multiplying the first element of y by LUT_Prob.
2- MAP should be only one value that is in this case "3.2" however there is an array instead.
3- Max_index in this case is correct, however, if the the "3.2" is anywhere else Max_index remains "0".
So, can somebody help?
Is there an easy way to obtain the values of the levels produced by pandas.cut?
For example:
import pandas as pd
x = pd.cut(np.arange(0,20), 10)
x
Out[1]:
(-0.019, 1.9]
(-0.019, 1.9]
(1.9, 3.8]
(1.9, 3.8]
(3.8, 5.7]
(3.8, 5.7]
(5.7, 7.6]
(5.7, 7.6]
(7.6, 9.5]
(7.6, 9.5]
(9.5, 11.4]
(9.5, 11.4]
(11.4, 13.3]
(11.4, 13.3]
(13.3, 15.2]
(13.3, 15.2]
(15.2, 17.1]
(15.2, 17.1]
(17.1, 19]
(17.1, 19]
Levels (10): Index(['(-0.019, 1.9]', '(1.9, 3.8]', '(3.8, 5.7]',
'(5.7, 7.6]', '(7.6, 9.5]', '(9.5, 11.4]',
'(11.4, 13.3]', '(13.3, 15.2]', '(15.2, 17.1]',
'(17.1, 19]'], dtype=object)
What I would like to get is something like:
x.magic_method
Out[2]:
[[-0.019, 1.9], [1.9, 3.8], [3.8, 5.7],
[5.7, 7.6], [7.6, 9.5], [9.5, 11.4],
[11.4, 13.3], [13.3, 15.2], (15.2, 17.1],
[17.1, 19]]
or some other representation more suitable to manipulation. Instead, we obtain the index by using x.levels, but this representation is a unicode object, so I have to use a couple of loops to get what I want.
UPDATE:
By the way, I need a solution that works with a sequence of values in the second argument: pd.cut(np.arange(0,20), arr)
You can convert from unicode list to an array by following code:
import pandas as pd
x = pd.cut(np.arange(0,20), 10)
np.array(map(lambda t:t[1:-1].split(","), x.levels), float)
You can do this, but prob better to explain what you are actually doing; e.g. you already have the Categorical variable.
In [27]: x, bins = pd.cut(np.arange(0,20), 10, retbins=True)
In [28]: [ [ round(l,3), round(r,3) ] for l, r in zip(bins[:-1],bins[1:]) ]
Out[28]:
[[-0.019, 1.9],
[1.9, 3.8],
[3.8, 5.7],
[5.7, 7.6],
[7.6, 9.5],
[9.5, 11.4],
[11.4, 13.3],
[13.3, 15.2],
[15.2, 17.1],
[17.1, 19.0]]
What is the most efficient way to remove negative elements in an array? I have tried numpy.delete and Remove all specific value from array and code of the form x[x != i].
For:
import numpy as np
x = np.array([-2, -1.4, -1.1, 0, 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10, 14, 16.2])
I want to end up with an array:
[0, 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10, 14, 16.2]
In [2]: x[x >= 0]
Out[2]: array([ 0. , 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10. , 14. , 16.2])
If performance is important, you could take advantage of the fact that your np.array is sorted and use numpy.searchsorted
For example:
In [8]: x[np.searchsorted(x, 0) :]
Out[8]: array([ 0. , 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10. , 14. , 16.2])
In [9]: %timeit x[np.searchsorted(x, 0) :]
1000000 loops, best of 3: 1.47 us per loop
In [10]: %timeit x[x >= 0]
100000 loops, best of 3: 4.5 us per loop
The difference in performance will increase as the size of the array increases because np.searchsorted does a binary search that is O(log n) vs. O(n) linear search that x >= 0 is doing.
In [11]: x = np.arange(-1000, 1000)
In [12]: %timeit x[np.searchsorted(x, 0) :]
1000000 loops, best of 3: 1.61 us per loop
In [13]: %timeit x[x >= 0]
100000 loops, best of 3: 9.87 us per loop
In numpy:
b = array[array>=0]
Example:
>>> import numpy as np
>>> arr = np.array([-2, -1.4, -1.1, 0, 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10, 14, 16.2])
>>> arr = arr[arr>=0]
>>> arr
array([ 0. , 1.2, 2.2, 3.1, 4.4, 8.3, 9.9, 10. , 14. , 16.2])
There's probably a cool way to do this is numpy because numpy is magic to me, but:
x = np.array( [ num for num in x if num >= 0 ] )