Related
I am new with array in python, please help me.
I have a multidimensional numpy array like this:
array([[ 0., 2073., 2352., 1119., 2074., 1344., 4035., 1980., 2213.,
2363., 2655., 2322., 1148., 2046., 2234., 1076., 1647., 2957.,
1968., 2246., 1723.],
[1517., 0., 891., 1537., 1993., 2231., 2574., 689., 1561.,
2157., 1517., 3275., 1566., 757., 774., 2190., 822., 1355.,
2152., 1575., 1064.],
[1597., 1329., 0., 1617., 1106., 1345., 1951., 1551., 1938.,
1270., 629., 2320., 1646., 1619., 862., 2267., 1357., 934.,
1264., 687., 342.]])
I want to add 0 at the beginning in every array, and at the end i want to add 22 of 0 of array so it becomes like this:
array([[0., 0., 2073., 2352., 1119., 2074., 1344., 4035., 1980., 2213.,
2363., 2655., 2322., 1148., 2046., 2234., 1076., 1647., 2957.,
1968., 2246., 1723.],
[0.,1517., 0., 891., 1537., 1993., 2231., 2574., 689., 1561.,
2157., 1517., 3275., 1566., 757., 774., 2190., 822., 1355.,
2152., 1575., 1064.],
[0.,1597., 1329., 0., 1617., 1106., 1345., 1951., 1551., 1938.,
1270., 629., 2320., 1646., 1619., 862., 2267., 1357., 934.,
1264., 687., 342.],
[0.,0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0, 0., 0, 0., 0.,
0., 0., 0.]])
please help me to code this
If the array is arr then you can use:
np.pad(arr, ((0, 1), (1, 0)))
You can insert 0 at the beginning of every array and then append a list of 22 0.
import numpy as np
data = np.array([[0., 2073., 2352., 1119., 2074., 1344., 4035., 1980., 2213.,
2363., 2655., 2322., 1148., 2046., 2234., 1076., 1647.,
2957.,
1968., 2246., 1723.],
[1517., 0., 891., 1537., 1993., 2231., 2574., 689., 1561.,
2157., 1517., 3275., 1566., 757., 774., 2190., 822., 1355.,
2152., 1575., 1064.],
[1597., 1329., 0., 1617., 1106., 1345., 1951., 1551., 1938.,
1270., 629., 2320., 1646., 1619., 862., 2267., 1357., 934.,
1264., 687., 342.]])
updated = np.insert(data, 0, 0, axis=1)
updated = np.append(updated, [[0] * 22], axis=0)
print(updated)
Output:
[[ 0. 0. 2073. 2352. 1119. 2074. 1344. 4035. 1980. 2213. 2363. 2655.
2322. 1148. 2046. 2234. 1076. 1647. 2957. 1968. 2246. 1723.]
[ 0. 1517. 0. 891. 1537. 1993. 2231. 2574. 689. 1561. 2157. 1517.
3275. 1566. 757. 774. 2190. 822. 1355. 2152. 1575. 1064.]
[ 0. 1597. 1329. 0. 1617. 1106. 1345. 1951. 1551. 1938. 1270. 629.
2320. 1646. 1619. 862. 2267. 1357. 934. 1264. 687. 342.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
Explanation:
We have inserted in axis 1 to add 0 in the existing multidimensional array.
We have appended the list of 22 0's to axis 0 at the end.
References:
Numpy documentation on insert method
Numpy documentation on append method
Let
import numpy as np
M = np.array([[ 1., -0.5301332 , 0.80512845],
[ 0., 0., 0.],
[ 0., 0., 0.]])
M is rank one, its only non zero eigenvalue is 1 (its trace). However np.linalg.norm(M, ord=2) returns 1.39 which is strictly greater than 1. Why?
The eigenvalues of M, returned by np.linalg.eigvals are 1, 0, 0, but the singular values of M are 1.39, 0, 0, which is a surprise to me. What did I miss?
In this particular case the 2-norm of M coincides with the Frobenius norm, which is given by the formula (np.sum(np.abs(M**2)))**(1/2), therefore we can see that:
import numpy as np
M = np.array([[ 1., -0.5301332 , 0.80512845],
[ 0., 0., 0.],
[ 0., 0., 0.]])
np.sqrt(np.sum(np.abs(M**2)))
1.388982732341062
np.sqrt(np.sum(np.abs(M**2))) == np.linalg.norm(M,ord=2) == np.linalg.norm(M, ord='fro')
True
In particular one can prove that the 2-norm is the square root of the largest eigenvalue of M.T#M i.e.
np.sqrt(np.linalg.eigvals(M.T#M)[0])
1.388982732341062
And this is its relation with eigenvalues of a matrix. Now recall that the singular values are the square root of the eigenvalues of M.T#M and we unpack the mistery.
Using a characterization of the Frobenius norm (square root of the sum of the trace of M.T#M):
np.sqrt(np.sum(np.diag(M.T#M)))
1.388982732341062
Confronting the results:
np.sqrt(np.linalg.eigvals(M.T#M)[0]) == np.sqrt(np.sum(np.diag(M.T#M))) == np.linalg.svd(M)[1][0]
True
second norm of a matrix the square root of the sum of all elements squared
norm(M, ord=2) = (1.**2 + 0.5301332**2 + 0.80512845**2)**0.5 = 1.39
to get the relation between the eigen values and singular values you need to calculate the eigen values of M^H.M and square root it
eigV = np.linalg.eigvals(M.T.dot(M))
array([1.92927303, 0. , 0. ])
eigV**0.5
array([1.38898273, 0. , 0. ])
This is perfectly normal. In the general case, the singular values are not equals to the eigen values. This is true only for positive Hermitian matrices.
For squared matrices, you have the following relationship:
M = np.matrix([[ 1., -0.5301332 , 0.80512845],
[ 0., 0., 0.],
[ 0., 0., 0.]])
u, v= np.linalg.eig(M.H # M) # M.H # M is Hermitian
print(np.sqrt(u)) # [1.38898273 0. 0. ]
u,s,v = lin.svd(M)
print(s) # [1.38898273 0. 0. ]
I have this piece of code in Python
for i in range(len(ax)):
for j in range(len(rx)):
x = ax[i] + rx[j]
y = ay[i] + ry[j]
A[x,y] = A[x,y] + 1
where
A.shape = (N,M)
ax.shape = ay.shape = (L)
rx.shape = ry.shape = (K)
I wanted to vectorize or otherwise make it more efficient, i.e. faster, and if possible more economical in memory consumption. Here, my ax and ay refer to the absolute elements of an array A, while rx and ay are relative coordinates. So, I'm updating the counter array A.
My table A can be 1000x1000, while ax,ay are 100x1 and cx,cy are 300x1. The whole thing's inside the loop, preferably the optimized code doesn't keep creating big tables of A size.
This question is related to the one I asked before, but it's not directly applicable to this situation due to the way increment works. Here's an example.
This code does exactly what I want:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0,0])
ry = np.array([0,0,0])
for i in range(len(ax)):
for j in range(len(rx)):
x = ax[i] + rx[j]
y = ay[i] + ry[j]
print(x,y)
A[x,y] = A[x,y] + 1
A
array([[ 0., 1., 0., 0., 0.],
[ 0., 3., 0., 0., 0.],
[ 0., 2., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
However, the following code doesn't work, because when we're incrementing an array, it pre-calculates the right side with the array:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0])
ry = np.array([0,0])
x = ax + rx[:,np.newaxis]
y = ay + ry[:,np.newaxis]
A[x,y] = A[x,y] + 1
A
array([[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
This solution works in terms of the correctness of numbers, but it's not the fastest, probably, because of np.add.at() function is not buffered:
import numpy as np
A = np.zeros((4,5))
ax = np.arange(1,3)
ay = np.array([1,1])
rx = np.array([-1,0,0])
ry = np.array([0,0,0])
x = ax + rx[:,np.newaxis]
y = ay + ry[:,np.newaxis]
np.add.at(A,[x,y],1)
A
Here's one leveraging broadcasting, getting linear indices, which are then fed to the very efficient np.bincount for binned summations -
m,n = 4,5 # shape of output array
X = ax[:,None] + rx
Y = ay[:,None] + ry
Aout = np.bincount((X*n + Y).ravel(), minlength=m*n).reshape(m,n)
Alternative one with np.flatnonzero -
idx = (X*n + Y).ravel()
idx.sort()
m = np.r_[True,idx[1:] != idx[:-1],True]
A.ravel()[idx[m[:-1]]] = np.diff(np.flatnonzero(m))
If you are adding into A iteratively, replace = with += there at the last step.
I am very new to Python... and I am having a hard time plugging the contents of my 1d array into a nonlinear equation so I can ultimately plot the results. My code is below:
import numpy as np
import matplotlib.pyplot as plt
def readfiles(file_list):
""" read <TAB> delemited files as strings
ignoring '# Comment' lines """
data = []
for fname in file_list:
data.append(
np.genfromtxt(fname,
comments='#', # skip comment lines
delimiter='\t',
dtype ="|S", autostrip=True).T)
return data
data = readfiles(['CR1000_rawMeasurements_15m.txt'])
def column(matrix, i):
return [row[i] for row in matrix]
x = column(data,18)
for i in x:
thermTemp1_degC = 1/(1.401E-3 + 2.377E-4*np.log(i) + 9.730E-8*np.log(i)**3)-273.15
All I have been successfully able to do is extract the column I need from my data. When I run this script, I get 'TypeError: Not implemented for this type.' (my 1d array, x, is just a column of zeros right now.) How can I fix this?
There are a few points to address here.
Returning the Correct Column
The array you've given in the comments is a little strange, but you can retrieve the columns with numpy:
data = [[ 737055., 0.], [ 737055., 0.], [ 737055., 0.], [ 737055., 0.], [ 737055., 0.], [ 735773., 0.], [ 735773., 0.], [ 735773., 0.]]]
data
=> [[[737055.0, 0.0],
[737055.0, 0.0],
[737055.0, 0.0],
[737055.0, 0.0],
[737055.0, 0.0],
[735773.0, 0.0],
[735773.0, 0.0],
[735773.0, 0.0]]]
column_0 = np.array(data)[0][:, 0]
column_1 = np.array(data)[0][:, 1]
column_0
=> array([ 737055., 737055., 737055., 737055., 737055., 735773.,
735773., 735773.])
column_1
=> array([ 0., 0., 0., 0., 0., 0., 0., 0.])
Performing the Calculation
As x is a numpy array (if you use the above column code) you don't need to put this in a for loop:
thermTemp1_degC = 1/(1.401E-3 + 2.377E-4*np.log(i) + 9.730E-8*np.log(i)**3)-273.15
Here thermTemp1_degC is a numpy array the same size as x.
I have a list of list with 1,200 rows and 500,000 columns. How do I convert it into a numpy array?
I've read the solutions on Bypass "Array is too big" python error but they are not helping.
I tried to put them into a numpy array:
import random
import numpy as np
lol = [[random.uniform(0,1) for j in range(500000)] for i in range(1200)]
np.array(lol)
[Error]:
ValueError: array is too big.
Then i've tried pandas:
import random
import pandas as pd
lol = [[random.uniform(0,1) for j in range(500000)] for i in range(1200)]
pd.lib.to_object_array(lol).astype(float)
[Error]:
ValueError: array is too big.
I've also tried hdf5 as #askewchan suggested:
import h5py
filearray = h5py.File('project.data','w')
data = filearray.create_dataset('tocluster',(len(data),len(data[0])),dtype='f')
data[...] = data
[Error]:
data[...] = data
File "/usr/lib/python2.7/dist-packages/h5py/_hl/dataset.py", line 367, in __setitem__
val = numpy.asarray(val, order='C')
File "/usr/local/lib/python2.7/dist-packages/numpy/core/numeric.py", line 460, in asarray
return array(a, dtype, copy=False, order=order)
File "/usr/lib/python2.7/dist-packages/h5py/_hl/dataset.py", line 455, in __array__
arr = numpy.empty(self.shape, dtype=self.dtype if dtype is None else dtype)
ValueError: array is too big.
This post shows that I can store a huge numpy array in disk Python: how to store a numpy multidimensional array in PyTables?. But i can't even get my list of list into a numpy array =(
On a system with 32GB of RAM and 64-bit Python your code:
import random
import numpy as np
lol = [[random.uniform(0,1) for j in range(500000)] for i in range(1200)]
np.array(lol)
works just fine for me but it's probably not the best route to take. This is the kind of thing PyTables was built for. Since you're dealing with homogeneous data you can use the Array class or, better yet, the CArray class (which supports compression). This can be done as follows:
import numpy as np
import tables as pt
# Create container
h5 = pt.open_file('myarray.h5', 'w')
filters = pt.Filters(complevel=6, complib='blosc')
carr = h5.create_carray('/', 'carray', atom=pt.Float32Atom(), shape=(1200, 500000), filters=filters)
# Fill the array
m, n = carr.shape
for j in xrange(m):
carr[j,:] = np.random.randn(n)
h5.close() # "myarray.h5" (~2.2 GB)
# Open file
h5 = pt.open_file('myarray.h5', 'r')
carr = h5.root.carray
# Display some numbers from array
print carr[973:975, :4]
print carr.dtype
If you print carr.flavor it will return 'numpy'. You can use this carr in the same way you can use a NumPy array. The information is stored on disk but is still quite fast.
With h5py / hdf5:
import numpy as np
import h5py
lol = np.empty((1200, 5000)).tolist()
f = h5py.File('big.hdf5', 'w')
bd = f.create_dataset('big_dataset', (len(lol), len(lol[0])), dtype='f')
bd[...] = lol
Then, I believe you can access your big dataset bd as if it were an array, but it is stored and accessed from disk, not memory:
In [14]: bd[0, 1:10]
Out[14]:
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
And you can have several 'datasets' in the one file (multiple arrays).
abd = f.create_dataset('another_big_dataset', (len(lol), len(lol[0])), dtype='f')
abd[...] = lol
abd += 10
Then:
In [24]: abd[:3, :10]
Out[24]:
array([[ 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
[ 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
[ 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.]], dtype=float32)
In [25]: bd[:3, :10]
Out[25]:
array([[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)
My computer can't handle your example, so I can't test this with an array your size but I hope it works!
Depending on what you want to do with your array, you might have more luck with pytables, which does a lot more than h5py.
See also:
Python Numpy Very Large Matrices
exporting from/importing to numpy, scipy in SQLite and HDF5 formats
Have you tried assigning a dtype? This works for me.
import random
import numpy as np
lol = [[random.uniform(0,1) for j in range(500000)] for i in range(1200)]
ar = np.array(lol, dtype=np.float64)
Another option is to use blaze. http://blaze.pydata.org/
import random
import blaze
lol = [[random.uniform(0,1) for j in range(500000)] for i in range(1200)]
ar = blaze.array(lol)
The problem seems to be that you are using something (either OS or python) which is only 32bit, which is the source of the size limitation. The solution is to upgrade to 64bit.
An alternative is the following:
lol = np.empty((1200,500000))
for i in range(lol.shape[0]):
lol[i] = [random.uniform(0,1) for j in range(lol.shape[1])]
This is reasonably close to your initial form, I hope it can fit into your code. I cannot test with your numbers, as I don't have enough RAM to handle the array.