Why does python numpy std() make unwanted spaces?

Why does python numpy std() make unwanted spaces? - python

'car3.csv' file download link
import csv
num = open('car3.csv')
nums = csv.reader(num)
nums_list = []
for i in nums:
nums_list.append(i)
import numpy as np
nums_arr = np.array(nums_list, dtype = np.float32)
print(nums_arr)
print(np.std(nums_arr, axis=0))
The result is this.
[[ 1. 1. 2.]
[ 1. 1. 2.]
[ 1. 1. 2.]
...,
[ 0. 0. 5.]
[ 0. 0. 5.]
[ 0. 0. 5.]]
[ 0.5 0.5 1.11803401]
There are lots of spaces that I didn't expected.
How can I handle these anyway?

That is not a spacing problem. What all you need to do is to save the output of the standard deviation. Then, you can access each value like this:
std_arr = np.std(nums_arr, axis=0) # array which holds std of each column
# now, you can access them by indexing:
print(std_arr[0]) # output here is 0.5
print(std_arr[1]) # output here is 0.5
print(std_arr[2]) # output here is 1.118034

Related

Numpy calculate gradients accross matrices

I am using the following to calculate the running gradients between data in the same indexes across multiple matrices:
import numpy as np
array_1 = np.array([[1,2,3], [4,5,6]])
array_2 = np.array([[2,3,4], [5,6,7]])
array_3 = np.array([[1,8,9], [9,6,7]])
flat_1 = array_1.flatten()
flat_2 = array_2.flatten()
flat_3 = array_3.flatten()
print('flat_1: {0}'.format(flat_1))
print('flat_2: {0}'.format(flat_2))
print('flat_3: {0}'.format(flat_3))
data = []
gradient_list = []
for item in zip(flat_1,flat_2,flat_3):
data.append(list(item))
print('items: {0}'.format(list(item)))
grads = np.gradient(list(item))
print('grads: {0}'.format(grads))
gradient_list.append(grads)
grad_array=np.array(gradient_list)
print('grad_array: {0}'.format(grad_array))
This doesn't look like an optimal way of doing this - is there a vectorized way of calculating gradients between data in 2d arrays?

numpy.gradient takes axis as parameter, so you might just stack the arrays, and then calcualte the gradient along a certain axis; For instance, use np.dstack with axis=2; If you need a different shape as result, just use reshape method:
np.gradient(np.dstack((array_1, array_2, array_3)), axis=2)
#array([[[ 1. , 0. , -1. ],
# [ 1. , 3. , 5. ],
# [ 1. , 3. , 5. ]],
# [[ 1. , 2.5, 4. ],
# [ 1. , 0.5, 0. ],
# [ 1. , 0.5, 0. ]]])
Or if flatten the arrays first:
np.gradient(np.column_stack((array_1.ravel(), array_2.ravel(), array_3.ravel())), axis=1)
#array([[ 1. , 0. , -1. ],
# [ 1. , 3. , 5. ],
# [ 1. , 3. , 5. ],
# [ 1. , 2.5, 4. ],
# [ 1. , 0.5, 0. ],
# [ 1. , 0.5, 0. ]])

Insert Numpy Array into Array with extending of the embedding array

First of all, I work with byte array (>= 400x400x1000) bytes.
I wrote a small function which can insert a multidimensional array (or a fraction of) into another one by indicating an offset. This works if the embedded array is smaller than the embedding array (case A). Otherwise the embedded array is truncated (case B).
case A) Inserting a 3x3 into a 5x5 matrix with offset 1,1 would look like this.
[[ 0. 0. 0. 0. 0.]
[ 0. 1. 1. 1. 0.]
[ 0. 1. 1. 1. 0.]
[ 0. 1. 1. 1. 0.]
[ 0. 0. 0. 0. 0.]]
case B) If the offsets are exceeding the dimensions of the embedding matrix, the smaller array is truncated. E.g. a (-1,-1) offset would result in this.
[[ 1. 1. 0. 0. 0.]
[ 1. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
case C) Now, instead of truncating the embedded array, I want to extend the embedding array (by zeroes) if the embedded array is either bigger than the embedding array or the offsets enforce it (e.g. case B). Is there a smart way with numpy or scipy to solve this?
[[ 1. 1. 1. 0. 0. 0.]
[ 1. 1. 1. 0. 0. 0.]
[ 1. 1. 1. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0.]]
Actually I work with 3D array, but for simplicity I wrote an example for 2D arrays. Current source:
import numpy as np
import nibabel as nib
def addAtPos(mat_bigger, mat_smaller, xyz_coor):
size_sm_x, size_sm_y = np.shape(mat_smaller)
size_gr_x, size_gr_y = np.shape(mat_bigger)
start_gr_x, start_gr_y = xyz_coor
start_sm_x, start_sm_y = 0,0
end_x, end_y = (start_gr_x + size_sm_x), (start_gr_y + size_sm_y)
print(size_sm_x, size_sm_y)
print(size_gr_x, size_gr_y)
print(end_x, end_y)
if start_gr_x < 0:
start_sm_x = -start_gr_x
start_gr_x = 0
if start_gr_y < 0:
start_sm_y = -start_gr_y
start_gr_y = 0
if end_x > size_gr_x:
size_sm_x = size_sm_x - (end_x - size_gr_x)
end_x = size_gr_x
if end_y > size_gr_y:
size_sm_y = size_sm_y - (end_y - size_gr_y)
end_y = size_gr_y
# copy all or a chunk (if offset is small/big enough) of the smaller matrix into the bigger matrix
mat_bigger[start_gr_x:end_x, start_gr_y:end_y] = mat_smaller[start_sm_x:size_sm_x, start_sm_y:size_sm_y]
return mat_bigger
a_gr = np.zeros([5,5])
a_sm = np.ones([3,3])
a_res = addAtPos(a_gr, a_sm, [-2,1])
#print (a_gr)
print (a_res)

Actually there is an easier way to do it.
For your first example of a 3x3 array embedded to a 5x5 one you can do it with something like:
A = np.array([[1,1,1], [1,1,1], [1,1,1]])
(N, M) = A.shape
B = np.zeros(shape=(N + 2, M + 2))
B[1:-1:, 1:-1] = A
By playing with slicing you can select a subset of A and insert it anywhere within a continuous subset of B.
Hope it helps! ;-)

creating a new identical numpy array that does not affect the old array

Ok before we get into the details, I have searched on site for any similar problems but most of them are not solutions that worked for me.
Here is what I tried to do:
my_array = np.zeros([5,5])
for i in range(4):
temp = my_array[:]
temp +=1
So I need to do a try and error on my_array without changing it. This is a simplified version with just some key points. But the trial I made changes both my_array and temp.
So far, the solutions on this web made use of [:] or .copy. I have tried both ways but it still affects my_array.
Any help is appreciated!

copy works:
my_array = np.zeros([5,5])
for i in range(4):
temp = my_array.copy()
temp +=1
print(temp)
#[[ 1. 1. 1. 1. 1.]
# [ 1. 1. 1. 1. 1.]
# [ 1. 1. 1. 1. 1.]
# [ 1. 1. 1. 1. 1.]
# [ 1. 1. 1. 1. 1.]]
print(my_array)
#[[ 0. 0. 0. 0. 0.]
# [ 0. 0. 0. 0. 0.]
# [ 0. 0. 0. 0. 0.]
# [ 0. 0. 0. 0. 0.]
# [ 0. 0. 0. 0. 0.]]

You need to perform a deepcopy
import copy
temp = copy.deepcopy(my_array)
after this any changes made to temp will not be reflected upon my_array
That should do it.

Predicting missing values in recommender System

I am trying to implement Non-negative Matrix Factorization so as to find the missing values of a matrix for a Recommendation Engine Project. I am using the nimfa library to implement matrix factorization. But can't seem to figure out how to predict the missing values.
The missing values in this matrix is represented by 0.
a=[[ 1. 0.45643546 0. 0.1 0.10327956 0.0225877 ]
[ 0.15214515 1. 0.04811252 0.07607258 0.23570226 0.38271325]
[ 0. 0.14433757 1. 0.07905694 0. 0.42857143]
[ 0.1 0.22821773 0.07905694 1. 0. 0.27105237]
[ 0.06885304 0.47140452 0. 0. 1. 0.13608276]
[ 0.00903508 0.4592559 0.17142857 0.10842095 0.08164966 1. ]]
import nimfa
model = nimfa.Lsnmf(a, max_iter=100000,rank =4)
#fit the model
fit = model()
#get U and V matrices from fit
U = fit.basis()
V = fit.coef()
print numpy.dot(U,V)
But the ans given is nearly same as a and I can't predict the zero values.
Please tell me which method to use or any other implementations possible and any possible resources.
I want to use this function to minimize the error in predicting the values.
error=|| a - UV ||_F + c*||U||_F + c*||V||_F
where _F denotes the frobenius norm

I have not used nimfa before so I cannot answer on exactly how to do that, but with sklearn you can perform a preprocessor to transform the missing values, like this:
In [28]: import numpy as np
In [29]: from sklearn.preprocessing import Imputer
# prepare a numpy array
In [30]: a = np.array(a)
In [31]: a
Out[31]:
array([[ 1. , 0.45643546, 0. , 0.1 , 0.10327956,
0.0225877 ],
[ 0.15214515, 1. , 0.04811252, 0.07607258, 0.23570226,
0.38271325],
[ 0. , 0.14433757, 1. , 0.07905694, 0. ,
0.42857143],
[ 0.1 , 0.22821773, 0.07905694, 1. , 0. ,
0.27105237],
[ 0.06885304, 0.47140452, 0. , 0. , 1. ,
0.13608276],
[ 0.00903508, 0.4592559 , 0.17142857, 0.10842095, 0.08164966,
1. ]])
In [32]: pre = Imputer(missing_values=0, strategy='mean')
# transform missing_values as "0" using mean strategy
In [33]: pre.fit_transform(a)
Out[33]:
array([[ 1. , 0.45643546, 0.32464951, 0.1 , 0.10327956,
0.0225877 ],
[ 0.15214515, 1. , 0.04811252, 0.07607258, 0.23570226,
0.38271325],
[ 0.26600665, 0.14433757, 1. , 0.07905694, 0.35515787,
0.42857143],
[ 0.1 , 0.22821773, 0.07905694, 1. , 0.35515787,
0.27105237],
[ 0.06885304, 0.47140452, 0.32464951, 0.27271009, 1. ,
0.13608276],
[ 0.00903508, 0.4592559 , 0.17142857, 0.10842095, 0.08164966,
1. ]])
You can read more here.

Error using numpy.logspace() : how to generate numbers spaced evenly on a log-scale

I am trying to use numpy.logspace()to generate 50 values from 1e-10 to 1e-14.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.logspace.html
import numpy as np
x = np.logspace(1e-10, 1e-14, num=50)
print x
The output I get is incorrect:
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
What are my other options?

>>> import numpy as np
>>> np.logspace(-10, -14, 50)
array([ 1.00000000e-10, 8.28642773e-11, 6.86648845e-11,
5.68986603e-11, 4.71486636e-11, 3.90693994e-11,
3.23745754e-11, 2.68269580e-11, 2.22299648e-11,
1.84206997e-11, 1.52641797e-11, 1.26485522e-11,
1.04811313e-11, 8.68511374e-12, 7.19685673e-12,
5.96362332e-12, 4.94171336e-12, 4.09491506e-12,
3.39322177e-12, 2.81176870e-12, 2.32995181e-12,
1.93069773e-12, 1.59985872e-12, 1.32571137e-12,
1.09854114e-12, 9.10298178e-13, 7.54312006e-13,
6.25055193e-13, 5.17947468e-13, 4.29193426e-13,
3.55648031e-13, 2.94705170e-13, 2.44205309e-13,
2.02358965e-13, 1.67683294e-13, 1.38949549e-13,
1.15139540e-13, 9.54095476e-14, 7.90604321e-14,
6.55128557e-14, 5.42867544e-14, 4.49843267e-14,
3.72759372e-14, 3.08884360e-14, 2.55954792e-14,
2.12095089e-14, 1.75751062e-14, 1.45634848e-14,
1.20679264e-14, 1.00000000e-14])

For np.logspace, the bounds are given as exponents to a base, which defaults to 10.0:
>>> np.logspace(-10, -14, num=50, base=10)
array([1.00000000e-10, 8.28642773e-11, 6.86648845e-11, 5.68986603e-11,
4.71486636e-11, 3.90693994e-11, 3.23745754e-11, 2.68269580e-11,
2.22299648e-11, 1.84206997e-11, 1.52641797e-11, 1.26485522e-11,
1.04811313e-11, 8.68511374e-12, 7.19685673e-12, 5.96362332e-12,
4.94171336e-12, 4.09491506e-12, 3.39322177e-12, 2.81176870e-12,
2.32995181e-12, 1.93069773e-12, 1.59985872e-12, 1.32571137e-12,
1.09854114e-12, 9.10298178e-13, 7.54312006e-13, 6.25055193e-13,
5.17947468e-13, 4.29193426e-13, 3.55648031e-13, 2.94705170e-13,
2.44205309e-13, 2.02358965e-13, 1.67683294e-13, 1.38949549e-13,
1.15139540e-13, 9.54095476e-14, 7.90604321e-14, 6.55128557e-14,
5.42867544e-14, 4.49843267e-14, 3.72759372e-14, 3.08884360e-14,
2.55954792e-14, 2.12095089e-14, 1.75751062e-14, 1.45634848e-14,
1.20679264e-14, 1.00000000e-14])
To specify the bounds absolutely, you can use np.geomspace:
>>> np.geomspace(1e-10, 1e-14, num=50)
array([1.00000000e-10, 8.28642773e-11, 6.86648845e-11, 5.68986603e-11,
4.71486636e-11, 3.90693994e-11, 3.23745754e-11, 2.68269580e-11,
2.22299648e-11, 1.84206997e-11, 1.52641797e-11, 1.26485522e-11,
1.04811313e-11, 8.68511374e-12, 7.19685673e-12, 5.96362332e-12,
4.94171336e-12, 4.09491506e-12, 3.39322177e-12, 2.81176870e-12,
2.32995181e-12, 1.93069773e-12, 1.59985872e-12, 1.32571137e-12,
1.09854114e-12, 9.10298178e-13, 7.54312006e-13, 6.25055193e-13,
5.17947468e-13, 4.29193426e-13, 3.55648031e-13, 2.94705170e-13,
2.44205309e-13, 2.02358965e-13, 1.67683294e-13, 1.38949549e-13,
1.15139540e-13, 9.54095476e-14, 7.90604321e-14, 6.55128557e-14,
5.42867544e-14, 4.49843267e-14, 3.72759372e-14, 3.08884360e-14,
2.55954792e-14, 2.12095089e-14, 1.75751062e-14, 1.45634848e-14,
1.20679264e-14, 1.00000000e-14])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does python numpy std() make unwanted spaces? - python

Related

Numpy calculate gradients accross matrices

Insert Numpy Array into Array with extending of the embedding array

creating a new identical numpy array that does not affect the old array

Predicting missing values in recommender System

Error using numpy.logspace() : how to generate numbers spaced evenly on a log-scale

Categories

Resources