Let say you use the dct function, then do no manipulation of the data and use the invert transform; wouldn't the inverted data be the same as the pre-transformed data? Why the floating point issue? Is it a reported issue or is it a normal behavior?
In [21]: a = [1.2, 3.4, 5.1, 2.3, 4.5]
In [22]: b = dct(a)
In [23]: b
Out[23]: array([ 33. , -4.98384545, -4.5 , -5.971707 , 4.5 ])
In [24]: c = idct(b)
In [25]: c
Out[25]: array([ 12., 34., 51., 23., 45.])
Anyone has an explanation as why? Of course, a simple c*10**-1 would do the trick, but if you repeat the call of the function to use it on several dimensions, the error gets bigger:
In [37]: a = np.random.rand(3,3,3)
In [38]: d = dct(dct(dct(a).transpose(0,2,1)).transpose(2,1,0)).transpose(2,1,0).transpose(0,2,1)
In [39]: e = idct(idct(idct(d).transpose(0,2,1)).transpose(2,1,0)).transpose(2,1,0).transpose(0,2,1)
In [40]: a
Out[40]:
array([[[ 0.48709809, 0.50624831, 0.91190972],
[ 0.56545798, 0.85695062, 0.62484782],
[ 0.96092354, 0.17453537, 0.17884233]],
[[ 0.29433402, 0.08540074, 0.18574437],
[ 0.09942075, 0.78902363, 0.62663572],
[ 0.20372951, 0.67039551, 0.52292875]],
[[ 0.79952289, 0.48221372, 0.43838685],
[ 0.25559683, 0.39549153, 0.84129493],
[ 0.69093533, 0.71522961, 0.16522915]]])
In [41]: e
Out[41]:
array([[[ 105.21318703, 109.34963575, 196.97249887],
[ 122.13892469, 185.10133376, 134.96712825],
[ 207.55948396, 37.69964085, 38.62994399]],
[[ 63.57614855, 18.44656009, 40.12078466],
[ 21.47488098, 170.42910452, 135.35331646],
[ 44.00557341, 144.80543099, 112.95260949]],
[[ 172.69694529, 104.15816275, 94.69156014],
[ 55.20891593, 85.42617016, 181.71970442],
[ 149.2420308 , 154.48959477, 35.68949734]]])
Here a link to the doc.
It looks like dct and idct do not normalize by default. define dct to call fftpack.dct in the following manner. Do the same for idct.
In [13]: dct = lambda x: fftpack.dct(x, norm='ortho')
In [14]: idct = lambda x: fftpack.idct(x, norm='ortho')
Once done, you will get back the original answers after performing the transforms.
In [19]: import numpy
In [20]: a = numpy.random.rand(3,3,3)
In [21]: d = dct(dct(dct(a).transpose(0,2,1)).transpose(2,1,0)).transpose(2,1,0).transpose(0,2,1)
In [22]: e = idct(idct(idct(d).transpose(0,2,1)).transpose(2,1,0)).transpose(2,1,0).transpose(0,2,1)
In [23]: a
Out[23]:
array([[[ 0.51699637, 0.42946223, 0.89843545],
[ 0.27853391, 0.8931508 , 0.34319118],
[ 0.51984431, 0.09217771, 0.78764716]],
[[ 0.25019845, 0.92622331, 0.06111409],
[ 0.81363641, 0.06093368, 0.13123373],
[ 0.47268657, 0.39635091, 0.77978269]],
[[ 0.86098829, 0.07901332, 0.82169182],
[ 0.12560088, 0.78210188, 0.69805434],
[ 0.33544628, 0.81540172, 0.9393219 ]]])
In [24]: e
Out[24]:
array([[[ 0.51699637, 0.42946223, 0.89843545],
[ 0.27853391, 0.8931508 , 0.34319118],
[ 0.51984431, 0.09217771, 0.78764716]],
[[ 0.25019845, 0.92622331, 0.06111409],
[ 0.81363641, 0.06093368, 0.13123373],
[ 0.47268657, 0.39635091, 0.77978269]],
[[ 0.86098829, 0.07901332, 0.82169182],
[ 0.12560088, 0.78210188, 0.69805434],
[ 0.33544628, 0.81540172, 0.9393219 ]]])
I am not sure why no normalization was chosen by default. But when using ortho, dct and idct each seem to normalize by a factor of 1/sqrt(2 * N) or 1/sqrt(4 * N). There may be applications where the normalization is needed for dct and not idct and vice versa.
Related
I have two arrays, and I want all the elements of one to be divided by the second. For example,
In [24]: a = np.array([1,2,3])
In [25]: b = np.array([1,2,3])
In [26]: a/b
Out[26]: array([1., 1., 1.])
In [27]: 1/b
Out[27]: array([1. , 0.5 , 0.33333333])
This is not the answer I want, the output I want is like (we can see all of the elements of a are divided by b)
In [28]: c = []
In [29]: for i in a:
...: c.append(i/b)
...:
In [30]: c
Out[30]:
[array([1. , 0.5 , 0.33333333]),
array([2. , 1. , 0.66666667]),
In [34]: np.array(c)
Out[34]:
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
But I don't like for loop, it's too slow for big data, so is there a function that included in numpy package or any good (faster) way to solve this problem?
It is simple to do in pure numpy, you can use broadcasting to calculate the outer product (or any other outer operation) of two vectors:
import numpy as np
a = np.arange(1, 4)
b = np.arange(1, 4)
c = a[:,np.newaxis] / b
# array([[1. , 0.5 , 0.33333333],
# [2. , 1. , 0.66666667],
# [3. , 1.5 , 1. ]])
This works, since a[:,np.newaxis] increases the dimension of the (3,) shaped array a into a (3, 1) shaped array, which can be used for the desired broadcasting operation.
First you need to cast a into a 2D array (same shape as the output), then repeat for the dimension you want to loop over. Then vectorized division will work.
>>> a.reshape(-1,1)
array([[1],
[2],
[3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1) / b
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
# Transpose will let you do it the other way around, but then you just get 1 for everything
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T / b
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
This should do the job:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
print(a.reshape(-1, 1) / b)
Output:
[[ 1. 0.5 0.33333333]
[ 2. 1. 0.66666667]
[ 3. 1.5 1. ]]
I run the following in Python and expected the columns in E[1] to be the eigenvectors of A, but they are not. Only Sympy.Matrix.eigenvects() seem to do it right. Why this error?
A
Out[194]:
matrix([[-3, 3, 2],
[ 1, -1, -2],
[-1, -3, 0]])
E = np.linalg.eig(A)
E
Out[196]:
(array([ 2., -4., -2.]),
matrix([[ -2.01889132e-16, 9.48683298e-01, 8.94427191e-01],
[ 5.54700196e-01, -3.16227766e-01, -3.71551690e-16],
[ -8.32050294e-01, 2.73252305e-17, 4.47213595e-01]]))
A*E[1] / E[1]
Out[205]:
matrix([[ 6.59900617, -4. , -2. ],
[ 2. , -4. , -3.88449298],
[ 2. , 8.125992 , -2. ]])
The eigenvectors are correct, within an expected margin of error.
What you discovered is that testing eigenvectors with element-wise division is a bad idea.
A better way is to compute the norm of the difference between matrix*vector and eigenvalue*vector.
NumPy performs computations in floating point arithmetics, limited to 52 bits of precision (double precision). This means any of its answers may contain numerical errors, at least of relative size 2**(-52) which is about 2e-16. So, when you see a number like 2e-16 coming from a calculation with numbers of size 1-3, the conclusion is: "that number should probably be zero, and the value we have for it is likely just noise". And if you divide by that number, noise is all you get.
SymPy, on the other hand, performs symbolic manipulations, so its answer (when it can get one) is exactly what the theory predicts.
From its docs:
The number w is an eigenvalue of a if there exists a vector v such that dot(a,v) = w * v. Thus, the arrays a, w, and v satisfy the equations dot(a[:,:], v[:,i]) = w[i] * v[:,i] for i \in {0,...,M-1}.
With your matrix:
In [1]: A = np.array([[-3, 3, 2],
...: [ 1, -1, -2],
...: [-1, -3, 0]])
...:
In [2]: w,v=np.linalg.eig(A)
In [3]: w
Out[3]: array([ 2., -4., -2.])
In [4]: v
Out[4]:
array([[ -9.39932874e-17, 9.48683298e-01, 8.94427191e-01],
[ 5.54700196e-01, -3.16227766e-01, 1.93473310e-16],
[ -8.32050294e-01, -4.08811066e-17, 4.47213595e-01]])
In [5]: np.dot(A,v)
Out[5]:
array([[ -2.22044605e-16, -3.79473319e+00, -1.78885438e+00],
[ 1.10940039e+00, 1.26491106e+00, -7.77156117e-16],
[ -1.66410059e+00, 4.44089210e-16, -8.94427191e-01]])
In [6]: w*v
Out[6]:
array([[ -1.87986575e-16, -3.79473319e+00, -1.78885438e+00],
[ 1.10940039e+00, 1.26491106e+00, -3.86946619e-16],
[ -1.66410059e+00, 1.63524427e-16, -8.94427191e-01]])
In [7]: np.dot(A,v)-w*v
Out[7]:
array([[ -3.40580301e-17, 8.88178420e-16, 2.22044605e-16],
[ 8.88178420e-16, -6.66133815e-16, -3.90209498e-16],
[ -2.22044605e-16, 2.80564783e-16, -3.33066907e-16]])
In [8]: np.allclose(np.dot(A,v), w*v)
Out[8]: True
So, yes, the documented test is satisfied, within floating point limits.
einsum can be used to highlight the i axis in the dot calculation.
In [10]: np.einsum('...k,ki->...i',A,v)
Out[10]:
array([[ -2.22044605e-16, -3.79473319e+00, -1.78885438e+00],
[ 1.10940039e+00, 1.26491106e+00, -7.77156117e-16],
[ -1.66410059e+00, 3.88578059e-16, -8.94427191e-01]])
When I divide by v (element wise), the result matches the eigenvalues, 2 -4,-2, except where v and the dot are virtually 0 (1e-16 or smaller).
In [11]: np.einsum('...k,ki->...i',A,v)/v
Out[11]:
array([[ 2.36234534, -4. , -2. ],
[ 2. , -4. , -4.01686475],
[ 2. , -9.50507681, -2. ]])
I'm reading the book Python for data analysis about numpy Boolen indexing, it says Selecting data from an array by boolean indexing always creates a copy of the data, but why I could change the original array using Boolen indexing? Is anyone could help me? Thanks a lot.
here is the example:
In [86]: data
Out[86]:
array([[-0.048 , 0.5433, -0.2349, 1.2792],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 , -2.026 , 0.7719, 0.3103],
[ 2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[ 0.5994, 0.8174, -0.9297, -1.2564]])
In [96]: data[data < 0] = 0
In [97]: data
Out[97]:
array([[ 0. , 0.5433, 0. , 1.2792],
[ 0. , 0.5465, 0.0939, 0. ],
[ 0. , 0. , 0.7719, 0.3103],
[ 2.1452, 0.8799, 0. , 0.0672],
[ 0. , 0. , 1.1503, 1.7289],
[ 0.1913, 0.4544, 0.4519, 0.5535],
[ 0.5994, 0.8174, 0. , 0. ]])
Boolean indexing returns a copy of the data, not a view of the original data, like one gets for slices.
>>> b=data[data<0]; b # this is a copy of data
array([-0.048 , -0.2349, -0.268 , -2.0445, -0.047 , -2.026 , -0.0523,
-1.0023, -0.1698, -0.9297, -1.2564])
I can manipulate b and data is preserved.
>>> b[:] = 0; b
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> data
array([[-0.048 , 0.5433, -0.2349, 1.2792],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 , -2.026 , 0.7719, 0.3103],
[ 2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[ 0.5994, 0.8174, -0.9297, -1.2564]])
Now, for a slice:
>>> a = data[0,:]; a # a is not a copy of data
array([-0.048 , 0.5433, -0.2349, 1.2792])
>>> a[:] = 0; a
array([ 0., 0., 0., 0.])
>>> data
array([[ 0. , 0. , 0. , 0. ],
[-0.268 , 0.5465, 0.0939, -2.0445],
[-0.047 , -2.026 , 0.7719, 0.3103],
[ 2.1452, 0.8799, -0.0523, 0.0672],
[-1.0023, -0.1698, 1.1503, 1.7289],
[ 0.5994, 0.8174, -0.9297, -1.2564]])
However, as you've identified, assignments made via indexed arrays are always made to the original data.
>>> data[data<0] = 1; data
array([[ 1. , 0.5433, 1. , 1.2792],
[ 1. , 0.5465, 0.0939, 1. ],
[ 1. , 1. , 0.7719, 0.3103],
[ 2.1452, 0.8799, 1. , 0.0672],
[ 1. , 1. , 1.1503, 1.7289],
[ 0.5994, 0.8174, 1. , 1. ]])
In a fetch or __getitem__ the boolean indexing does return a copy. But if used immediately before an assignment, it's a __setitem__ case, and the selected values will be changed:
In [196]: data = np.arange(10)
In [197]: d1 = data[data<5]
In [198]: d1 # a copy
Out[198]: array([0, 1, 2, 3, 4])
In [199]: d1[:] = 0
In [200]: data # not change to the original
Out[200]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Masked assignment:
In [201]: data[data<5] = 0
In [202]: data
Out[202]: array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9]) # changed data
Indirect assignment does nothing:
In [204]: data[data<5][:] = 1
In [205]: data
Out[205]: array([0, 0, 0, 0, 0, 5, 6, 7, 8, 9])
Think of it as data.__getitem__(mask).__setitem__(slice) = 1. The get item returns a copy, which the set item changes - but doesn't change the original.
So if you need to use advanced indexing of the LHS, make sure it is immediately before the assignment. And you can't use 2 advanced indexing step on the LHS.
view v copy
With basic indexing it is possible to use the original databuffer, and just change attributes like shape and strides. For example:
In [85]: x = np.arange(10)
In [86]: x.shape
Out[86]: (10,)
In [87]: x.strides
Out[87]: (4,)
In [88]: y = x[::2]
In [89]: y.shape
Out[89]: (5,)
In [90]: y.strides
Out[90]: (8,)
y has the same databuffer as x (compare the x.__array_interface__ dictionaries). x uses all 10 4bytes elements; y uses every other one (strides steps by 8 bytes instead of 4).
But with advanced indexing you can't express the element selection in terms of shape and strides.
In [98]: z = x[[1,2,6,7,0]]
In [99]: z.shape
Out[99]: (5,)
In [100]: z.strides
Out[100]: (4,)
Items in the original array can be selected in any order and with repetitions. There's no regular pattern. So a copy is required.
I have an array of numbers:
q1a = [1,2,2,2,4,3,1,3,3,4,0,0]
I want to save these in an array where it will be stored in as (number, proportion of the number) using PYTHON.
Such as : [[0 0.1667], [1 0.1667], [2 0.25], [3 0.25], [4 0.167]].
This is essential to calculate the distribution of the numbers. How can I do this?
Although I wrote the code to save the numbers as : (number, number of times it occurred in the list) but I cant figure it out how I can find the proportion of each number. Thanks.
sorted_sample_values_of_x = unique, counts = np.unique(q1a, return_counts=True)
np.asarray((unique, counts)).T
np.put(q1a, [0], [0])
sorted_x = np.matrix(sorted_sample_values_of_x)
sorted_x = np.transpose(sorted_x)
print('\n' 'Values of x (sorted):' '\n')
print(sorted_x)
You will need to do two things.
Convert sorted_x array as a float array.
And then divide it by sum of counts array.
Example -
In [34]: sorted_x = np.matrix(sorted_sample_values_of_x)
In [35]: sorted_x = np.transpose(sorted_x).astype(float)
In [36]: sorted_x
Out[36]:
matrix([[ 0., 2.],
[ 1., 2.],
[ 2., 3.],
[ 3., 3.],
[ 4., 2.]])
In [37]: sorted_x[:,1] = sorted_x[:,1]/counts.sum()
In [38]: sorted_x
Out[38]:
matrix([[ 0. , 0.16666667],
[ 1. , 0.16666667],
[ 2. , 0.25 ],
[ 3. , 0.25 ],
[ 4. , 0.16666667]])
To store the numbers with the propertions in a new array, do -
In [41]: sorted_x = np.matrix(sorted_sample_values_of_x)
In [42]: sorted_x = np.transpose(sorted_x).astype(float)
In [43]: ns = sorted_x/np.array([1,counts.sum()])
In [44]: ns
Out[44]:
matrix([[ 0. , 0.16666667],
[ 1. , 0.16666667],
[ 2. , 0.25 ],
[ 3. , 0.25 ],
[ 4. , 0.16666667]])
>>> q1a = [1,2,2,2,4,3,1,3,3,4,0,0]
>>> from collections import Counter
>>> sorted([[x, float(y)/len(q1a)] for (x, y) in Counter(q1a).items()],
... key=lambda x: x[0])
[[0, 0.16666666666666666],
[1, 0.16666666666666666],
[2, 0.25],
[3, 0.25],
[4, 0.16666666666666666]]
In [12]: from collections import Counter
In [13]: a = [1,2,2,2,4,3,1,3,3,4,0,0]
In [14]: counter = Counter(a)
In [15]: sorted( [ [key, float(counter[key])/len(a)] for key in counter ] )
Out[15]:
[[0, 0.16666666666666666],
[1, 0.16666666666666666],
[2, 0.25],
[3, 0.25],
[4, 0.16666666666666666]]
#!/usr/bin/env python
import numpy as np
q1a = [1,2,2,2,4,3,1,3,3,4,0,0]
unique, counts = np.unique(q1a, return_counts=True)
counts = counts.astype(float) # convert to float
counts /= counts.sum() # counts -> proportion
print(np.c_[unique, counts])
Output
[[ 0. 0.16666667]
[ 1. 0.16666667]
[ 2. 0.25 ]
[ 3. 0.25 ]
[ 4. 0.16666667]]
As an alternative to collections.Counter, try collections.defaultdict. This allows you to accumulate the total frequency as you proceed through the input (i.e should be more efficient) and it's more readable (IMO).
from collections import defaultdict
q1a = [1,2,2,2,4,3,1,3,3,4,0,0]
n = float(len(q1a))
frequencies = defaultdict(int)
for i in q1a:
frequencies[i] += 1/n
print frequencies.items()
[(0, 0.16666666666666666), (1, 0.16666666666666666), (2, 0.25), (3, 0.25), (4, 0.16666666666666666)]
An fun alternative using numpy
print [(val, 1.*np.sum(q1a==val)/len(q1a) ) for val in np.unique(q1a) ]
#[(0, 0.16666666666666666),
#(1, 0.16666666666666666),
#(2, 0.25),
#(3, 0.25),
#(4, 0.16666666666666666)]
The 1. is to force float division
I have a ndarray that looks like this:
array([[ -2.1e+00, -9.89644000e-03],
[ -2.2e+00, 0.00000000e+00],
[ -2.3e+00, 2.33447000e-02],
[ -2.4e+00, 5.22411000e-02]])
Whats the most pythonic way to add the integer 2 to the first column to give:
array([[ -0.1e+00, -9.89644000e-03],
[ -0.2e+00, 0.00000000e+00],
[ -0.3e+00, 2.33447000e-02],
[ -0.4e+00, 5.22411000e-02]])
Edit:
To add 2 to the first column only, do
>>> import numpy as np
>>> x = np.array([[ -2.1e+00, -9.89644000e-03],
[ -2.2e+00, 0.00000000e+00],
[ -2.3e+00, 2.33447000e-02],
[ -2.4e+00, 5.22411000e-02]])
>>> x[:,0] += 2 # : selects all rows, 0 selects first column
>>> x
array([[-0.1, -0.00989644],
[-0.2, 0. ],
[-0.3, 0.0233447 ],
[-0.4, 0.0522411 ]])
>>> import numpy as np
>>> x = np.array([[ -2.1e+00, -9.89644000e-03],
[ -2.2e+00, 0.00000000e+00],
[ -2.3e+00, 2.33447000e-02],
[ -2.4e+00, 5.22411000e-02]])
>>> x + 2
array([[-0.1, 1.99010356],
[-0.2, 2. ],
[-0.3, 2.0233447 ],
[-0.4, 2.0522411 ]])
Perhaps the Numpy Tutorial may help you.