I often find myself having to create a line (or some kind of other shape) within a 2D array. In other words, the value of the array is zero everywhere apart from where y = mx + c. (Aside - the motivation for this approach, rather than storing a line in a 1D array, is that my work often requires 2D Fourier transform, and so I need the zeros everywhere apart from the line/shape/etc etc).
My usual approach for doing this is the following:
array = numpy.zeros((height, width))
for i, line in enumerate(array):
for j, pixel in enumerate(line):
if j == m*i + c:
array[i,j] = 1
This works fine, but it doesn't strike me as particularly pythonic, and it tends to get pretty slow when the array gets big. So, my question is a rather general one - does anybody know of a better way of doing this?
Thanks in advance!
You could use broadcasting here to get rid of those nested loops -
import numpy as np
out = (np.arange(height) == m*np.arange(width)[:,None]+c)+0.0
As an example to verify for correctness, with these parameters -
height = 10
width = 10
m = 0.5;
c = 6;
you would have -
In [306]: array
Out[306]:
array([[ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
In [307]: out
Out[307]:
array([[ 0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
The function np.fromfunction was designed for cases where an array can be constructed from the indices, such as this scenario.
In your case,
np.fromfunction(lambda i, j: j == m*i+c, (height, width), dtype=np.float)
would be equivalent to your approach, but using numpy's routines rather than Python for-loops.
Short demo:
import numpy as np
height, width = 10,10
m, c = 2, 4
a = np.zeros((height, width))
for i, line in enumerate(a):
for j, pixel in enumerate(a):
if j == m*i + c:
a[i,j] = 1
b = np.fromfunction(lambda i, j: j == m*i+c, (height, width), dtype=np.float)
np.all(a==b)
# True
b.astype(np.int) # as type added to reduce output (no need for all the periods)
#array([[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Edit: Even though this answer got accepted, I want to point out that #Divakar's answer is about 10 times faster on my machine. If you're looking for speed: use that answer if your problem lends itself easily to vectorization, like Divakar showed (not every fromfunction call can be easily vectorized). I upvoted it, because it's a nice approach to this problem.
Use np.put,but you need to create the list of specific indices, that you can do it with a list comprehension :
>>> np.put(arr,[j for j in range(arr.shape[1]) for i in range(arr.shape[0]) if j == m*i + c],1)
Demo:
>>> np.put(arr,[j for j in range(arr.shape[1]) for i in range(arr.shape[0]) if j == 3*i + 1],1)
>>> arr
array([[ 0., 1., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
>>> np.put(arr,[j for j in range(arr.shape[1]) for i in range(arr.shape[0]) if j == 0.5*i + 2],1)
>>> arr
array([[ 0., 1., 1.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]])
Related
Having an array filled with zeros, I want to create a view, use .ravel() on it, modify the array returned by ravel() and have this modification change the original array. Without the use of ravel() it works fine
zeros = np.zeros(shape=(10,10))
view = zeros[3:7,3:7]
view[:] = 1
print(zeros)
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
However, using .ravel() creates the following:
zeros = np.zeros(shape=(10,10))
view = zeros[3:7,3:7].ravel()
view[:] =1
print(zeros)
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
As one would expect, view.flags["OWNDATA"] returns "True", so a copy has been created. How can I change the code to create a view that lets me modify the original array ?
Tried
view[:] = view[:]+1
You can't. ravel, which is just a reshape, sometimes has to make a copy. A view is possible only when the selection of values can be expressed in a regular pattern, using either scalar or slice indices.
Consider a small example with distinct values:
In [47]: x = np.arange(9).reshape(3,3).copy()
In [48]: x
Out[48]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [49]: x.base
In [50]: y = x[1:,1:]
In [51]: y
Out[51]:
array([[4, 5],
[7, 8]])
In [52]: y.base
Out[52]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [53]: z = y.ravel()
In [54]: z
Out[54]: array([4, 5, 7, 8])
In [55]: x.ravel()
Out[55]: array([0, 1, 2, 3, 4, 5, 6, 7, 8])
In [56]: z.base
y is a view, but z is not. There's no way of selecting the z values from the flat x values with a slice.
But you can use the flat iterator to index y in a flat manner:
In [59]: y.flat[2]=10
In [60]: y
Out[60]:
array([[ 4, 5],
[10, 8]])
In [61]: x
Out[61]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 10, 8]])
I would like to get a matrix in shape of 100x100 like this:
[-2,1,0,0]
[1,-2,1,0]
[0,1,-2,1]
[0,0,1,-2]
I started with creating the diagonal:
import numpy as np
diagonal= (100)
diagonal= np.full(diagonal, -2)
A100 = (100,100)
A100 = np.zeros(A100)
np.fill_diagonal(A100, diagonal)
Now for changing the offset I tried:
off1=(99)
off1=np.ones(off1)
off1=np.diagonal(A100, offset=1)
But this doesn`t work.
Thanks for your help!
Construct the matrix from three identity matrices:
np.eye(100, k=1) + np.eye(100, k=-1) - 2 * np.eye(100)
P.S. This solution is 7x faster than the scipy.sparse solution.
You can use scipy.sparse.diags
from scipy.sparse import diags
A100 = diags([-2, 1, 1], [0, -1, 1], shape = (100, 100))
A100.A
Out[]:
array([[-2., 1., 0., ..., 0., 0., 0.],
[ 1., -2., 1., ..., 0., 0., 0.],
[ 0., 1., -2., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., -2., 1., 0.],
[ 0., 0., 0., ..., 1., -2., 1.],
[ 0., 0., 0., ..., 0., 1., -2.]])
I have a numpy array initially with zeros, like this:
v = np.zeros((5, 5))
v
array([[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0.]])
I also have a set of arrays idx1 and idx2.
idx1
array([[0, 3],
[0, 4],
[1, 3],
[2, 4]])
idx2
array([[0, 1],
[0, 2],
[0, 4],
[1, 3]])
Look upon each pair of values as row and column indices. So, for example, in idx1, the first pair (0, 3) would be indexers into v[0, 3] and so on.
I want to first set values at indexes specified by idx1 to 1, followed by all indexes specified by idx2 to 0.
Also, please note that if there is a pair (i, j) in some array, I want to set v[i, j] and v[j, i] at the same time.
My final result becomes:
array([[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.]])
I currently achieve this by doing:
def set_vals(x, i, j, v):
x[i, j] = x.T[i, j] = v
v = np.zeros((5, 5))
i1, j1 = idx1[:, 0], idx1[:, 1]
i2, j2 = idx2[:, 0], idx2[:, 1]
set_vals(v, i1, j1, 1)
set_vals(v, i2, j2, 0)
v # the result
However, I believe there might be a better way. Would love to hear any thoughts/suggestions for improvement. Thanks!
In search of a more "compact" way of expressing it, I got this -
v = np.zeros((5, 5))
v[tuple(np.r_[idx1,idx1[:,::-1]].T)] = 1
v[tuple(np.r_[idx2,idx2[:,::-1]].T)] = 0
On python3.6+, you can use the * unpacking operator to reduce this further:
v[[*np.r_[idx1,idx1[:,::-1]].T]] = 1
v[[*np.r_[idx2,idx2[:,::-1]].T]] = 0
v
array([[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0.]])
I need to convert one-hot encoding to categories represented by unique integers. So one-hot encoding created with the following code:
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)
for x in [1,2,3]:
print(enc.transform([[x]]).toarray())
Out:
[[ 1. 0. 0.]]
[[ 0. 1. 0.]]
[[ 0. 0. 1.]]
Could be converted back to a set of unique integers, for example:
[1,2,3] or [11,37, 45] or any other where each integer uniquely represents a single class.
Is it possible to do with scikit-learn or any other python lib?
* Update *
Tried to:
labels = [[1],[2],[3], [4], [5],[6],[7]]
enc.fit(labels)
lst = []
for x in [1,2,3,4,5,6,7]:
lst.append(enc.transform([[x]]).toarray())
lst
Out:
[array([[ 1., 0., 0., 0., 0., 0., 0.]]),
array([[ 0., 1., 0., 0., 0., 0., 0.]]),
array([[ 0., 0., 1., 0., 0., 0., 0.]]),
array([[ 0., 0., 0., 1., 0., 0., 0.]]),
array([[ 0., 0., 0., 0., 1., 0., 0.]]),
array([[ 0., 0., 0., 0., 0., 1., 0.]]),
array([[ 0., 0., 0., 0., 0., 0., 1.]])]
a = np.array(lst)
np.where(a==1)[1]
Out:
array([0, 0, 0, 0, 0, 0, 0], dtype=int64)
Not what I need
You can do that using np.where as follows:
import numpy as np
a=np.array([[ 0., 1., 0.],
[ 1., 0., 0.],
[ 0., 0., 1.]])
np.where(a==1)[1]
This prints array([1, 0, 2], dtype=int64). This works since np.where(a==1)[1] returns the column indices of the 1's, which are exactly the labels.
In addition, since a is a 0,1-matrix, you can also replace np.where(a==1)[1] with just np.where(a)[1].
Update: The following solution should work with your format:
l=[np.array([[ 1., 0., 0., 0., 0., 0., 0.]]),
np.array([[ 0., 0., 1., 0., 0., 0., 0.]]),
np.array([[ 0., 1., 0., 0., 0., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 1., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 1., 0., 0.]]),
np.array([[ 0., 0., 0., 0., 0., 1., 0.]]),
np.array([[ 0., 0., 0., 0., 0., 0., 1.]])]
a=np.array(l)
np.where(a)[2]
This prints
array([0, 2, 1, 4, 4, 5, 6], dtype=int64)
Alternativaly, you could use the original solution together with #ml4294's comment.
You can use np.argmax():
from sklearn.preprocessing import OneHotEncoder
import numpy as np
enc = OneHotEncoder()
labels = [[1],[2],[3]]
enc.fit(labels)
x = enc.transform(labels).toarray()
# x = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
xr = (np.argmax(x, axis=1)+1).reshape(-1, 1)
print(xr)
This should return array([[1], [2], [3]]). If you want instead array([[0], [1], [2]]), just remove the +1 in the definition of xr.
Since you are using sklearn.preprocessing.OneHotEncoder to 'encode' the data, you can use its .inverse_transform() method to 'decode' the data (I think this requires .__version__ = 0.20.1 or newer):
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
labels = [[1],[2],[3]]
encoder = enc.fit(labels)
encoded_labels = encoder.transform(labels)
decoded_labels = encoder.inverse_transform(encoded_labels)
decoded_labels # array([[1],
[2],
[3]])
n.b. decoded_labels is a numpy array not a list.
Source: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder.inverse_transform
Is there a way I can allocate memory for scipy sparse matrix functions to process large datasets?
Specifically, I'm attempting to use Asymmetric Least Squares Smoothing (translated into python here and the original here) to perform a baseline correction on a large mass spec dataset (length of ~60,000).
The function (see below) uses the scipy.sparse matrix operations.
def baseline_als(y, lam, p, niter):
L = len(y)
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
w = np.ones(L)
for i in xrange(niter):
W = sparse.spdiags(w, 0, L, L)
Z = W + lam * D.dot(D.transpose())
z = spsolve(Z, w*y)
w = p * (y > z) + (1-p) * (y < z)
return z
I have no problem when I pass data sets that are 10,000 or less in length:
baseline_als(np.ones(10000),100,0.1,10)
But when passing larger data sets, e.g.
baseline_als(np.ones(50000), 100, 0.1, 10)
I get a MemoryError, for the line
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
Try changing
D = sparse.csc_matrix(np.diff(np.eye(L), 2))
to
diag = np.ones(L - 2)
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2)
D will be a sparse matrix in DIAgonal format. If it turns out that being in CSC format is important, convert it using the tocsc() method:
D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2).tocsc()
The following example shows that the old and new versions generate the same matrix:
In [67]: from scipy import sparse
In [68]: L = 8
Original:
In [69]: D = sparse.csc_matrix(np.diff(np.eye(L), 2))
In [70]: D.A
Out[70]:
array([[ 1., 0., 0., 0., 0., 0.],
[-2., 1., 0., 0., 0., 0.],
[ 1., -2., 1., 0., 0., 0.],
[ 0., 1., -2., 1., 0., 0.],
[ 0., 0., 1., -2., 1., 0.],
[ 0., 0., 0., 1., -2., 1.],
[ 0., 0., 0., 0., 1., -2.],
[ 0., 0., 0., 0., 0., 1.]])
New version:
In [71]: diag = np.ones(L - 2)
In [72]: D = sparse.spdiags([diag, -2*diag, diag], [0, -1, -2], L, L-2)
In [73]: D.A
Out[73]:
array([[ 1., 0., 0., 0., 0., 0.],
[-2., 1., 0., 0., 0., 0.],
[ 1., -2., 1., 0., 0., 0.],
[ 0., 1., -2., 1., 0., 0.],
[ 0., 0., 1., -2., 1., 0.],
[ 0., 0., 0., 1., -2., 1.],
[ 0., 0., 0., 0., 1., -2.],
[ 0., 0., 0., 0., 0., 1.]])