I have this list:
a = [ np.array([ 1, 2]), np.array([0])]
I want to iterate:
x = np.array([t[i] for i, t in enumerate(a)])
but since np.array([0]) has only one element, it will throw an error.
So, I thought to fill the np.array([0]) with another one zero , and then
a = [ np.array([ 1, 2]), np.array([0,0])]
x = np.array([t[i] for i, t in enumerate(a)])
print(x)
[1 0]
So, I am finding the biggest length in the list:
temp = []
for i in a:
temp.append(len(i))
themax = max(temp)
which is 2. (the np.array([1, 2]).
Now, I must somehow fill the other subelements with zeros..
Note, that I will always have the zero np.array([0]) which causes the problem.
The easiest way would be to change your list comprehension to give a zero instead of an array element in the case of an array being too small:
x = np.asarray([(t[i] if i < t.shape[0] else 0.) for i, t in enumerate(a)])
This is more efficient as you don't have to expand all arrays with zeros.
Related
I would like to build up a numpy matrix using rows I get in a loop. But how do I initialize the matrix? If I write
A = []
A = numpy.vstack((A, [1, 2]))
I get
ValueError: all the input array dimensions except for the concatenation axis must match exactly
What's the best practice for this?
NOTE: I do not know the number of rows in advance. The number of columns is known.
Unknown number of rows
One way is to form a list of lists, and then convert to a numpy array in one operation:
final = []
# x is some generator
for item in x:
final.append(x)
A = np.array(x)
Or, more elegantly, given a generator x:
A = np.array(list(x))
This solution is time-efficient but memory-inefficient.
Known number of rows
Append operations on numpy arrays are expensive and not recommended. If you know the size of the final array in advance, you can instantiate an empty (or zero) array of your desired size, and then fill it with values. For example:
A = np.zeros((10, 2))
A[0] = [1, 2]
Or in a loop, with a trivial assignment to demonstrate syntax:
A = np.zeros((2, 2))
# in reality, x will be some generator whose length you know in advance
x = [[1, 2], [3, 4]]
for idx, item in enumerate(x):
A[idx] = item
print(A)
array([[ 1., 2.],
[ 3., 4.]])
I have a list of ND arrays(vectors), each vector has a (1,300) shape.
My goal is to find duplicate vectors inside a list, to sum them and then divide them by the size of a list, the result value(a vector) will replace the duplicate vector.
For example, a is a list of ND arrays, a = [[2,3,1],[5,65,-1],[2,3,1]], then the first and the last element are duplicates.
their sum would be :[4,6,2],
which will be divided by the size of a list of vectors, size = 3.
Output: a = [[4/3,6/3,2/3],[5,65,-1],[4/3,6/3,2/3]]
I have tried to use a Counter but it doesn't work for ndarrays.
What is the Numpy way?
Thanks.
If you have numpy 1.13 or higher, this is pretty simple:
def f(a):
u, inv, c = np.unique(a, return_counts = True, return_inverse = True, axis = 0)
p = np.where(c > 1, c / a.shape[0], 1)[:, None]
return (u * p)[inv]
If you don't have 1.13, you'll need some trick to convert a into a 1-d array first. I recommend #Jaime's excellent answer using np.void here
How it works:
u is the unique rows of a (usually not in their original order)
c is the number of times each row of u are repeated in a
inv is the indices to get u back to a, i.e. u[inv] = a
p is the multiplier for each row of u based on your requirements. 1 if c == 1 and c / n (where n is the number of rows in a) if c > 1. [:, None] turns it into a column vector so that it broadcasts well with u
return u * p indexed back to their original locations by [inv]
You can use numpy unique , with count return count
elements, count = np.unique(a, axis=0, return_counts=True)
Return Count allow to return the number of occurrence of each element in the array
The output is like this ,
(array([[ 2, 3, 1],
[ 5, 65, -1]]), array([2, 1]))
Then you can multiply them like this :
(count * elements.T).T
Output :
array([[ 4, 6, 2],
[ 5, 65, -1]])
I would like to find the index of a value in symbolic matrix. For example, i want to find the index which have 'Zc' symbol in the below Q matrix.
from sympy import symbols, Matrix
Zc,Yc,L=symbols("Zc Yc L",real=True)
Q=Matrix(([0,Zc,-Y],[-Zc*L,0,L/2],[Yc,-L/2,0]))
expected answer is [(0,1),(1,0)].
I tried with numpy.where but it returned an empty set.That is,
numpy.where(A in K_P.free_symbols)
It gave an empty set as,
(array([0], dtype=int64)
Part 2:
If the Q matrix is
Q=Matrix(([0,Zc*L/6,-Yc],[-L*Zc/12,0,L/2],[Yc,-L*Zc*Yc/2,0]))
If i would like to find the index based on a product of symbols,i.e., Zc*L. Then how should i proceed. It should give me index if the value is Zc*L and L*Zc. But not the index of -L*Zc*Yc/2. So expected answer is [(0,1),(1,0)].
This is indeed a NumPy-style operation, it's not particularly mathematical or Matrix-like. To use NumPy methods, first convert Q to a NumPy array first, and then search through the array, applying lambda expr: Zc in expr.free_symbols. Putting everything together,
idx = np.nonzero(np.vectorize(lambda expr: Zc in expr.free_symbols)(np.array(Q)))
returns (array([0, 1]), array([1, 0]))
Caution: this is not a tuple of row-column coordinates, this is a tuple [row coordinates], [column coordinates]. For example, if there was also Zc in bottom right cornet, then idx is
(array([0, 1, 2]), array([1, 0, 2]))
A more readable form, and the one you probably want, is obtained with np.array(idx).T:
array([[0, 1],
[1, 0],
[2, 2]])
Or, without NumPy, just loop over the elements and report what you found.
found = []
for i in range(2):
for j in range(2):
if Zc in Q[i, j].free_symbols:
found.append([i, j])
print(found)
Here is a modification of this loop where we search for an expression, such as the product of two symbols. The test is whether the expression is modified if that product is replaced by 0. Also, apparently you don't want other symbols present? Check for that is added too.
found = []
for i in range(3):
for j in range(3):
if Q[i, j].subs(Zc*L, 0) != Q[i, j] and Q[i, j].free_symbols == set([L, Zc]):
found.append([i, j])
print(found)
if I have a matrix:
A = [[0,1,2], [0,2,3], [1,5,6]]
I want to find the rows that the first two elements are 0 and 1, get the result
0
find the rows that the first elements are 0, get the result
[0,1]
How should I do? What is the fastest way?
I'd do it like this:
>>> A = [[0,1,2], [0,2,3], [1,5,6]]
>>> [i for i, row in enumerate(A) if row[:2] == [0, 1]]
[0]
>>> [i for i, row in enumerate(A) if row[0] == 0]
[0, 1]
If you want to create both results at the same time, use a regular loop as the above would iterate your matrix twice:
res_0, res_01 = list(), list()
for i, row in enumerate(A):
if row[0] == 0:
res_0.append(i)
if row[1] == 1:
res_01.append(i)
Since your question is tagged as numpy, I'm going to assume that A is a numpy array rather than a set of nested lists as you've shown it.
In that case you can use a combination of slice indexing, vectorized logical comparisons, np.all, and np.where:
import numpy as np
A = np.array([[0,1,2], [0,2,3], [1,5,6]])
print(np.where(np.all(A[:, :2] == np.array([0, 1]), axis=1))[0])
# [0]
To break that down a bit:
# index all of the rows and the first two columns of A
print(A[:, :2])
# [[0 1]
# [0 2]
# [1 5]]
# for each row, is the first column equal to 0, and is the second equal to 1?
print(A[:, :2] == np.array([0, 1]))
# [[ True True]
# [ True False]
# [False False]]
# do the elements in *both* columns match [0, 1]?
print(np.all(A[:, :2] == np.array([0, 1]), axis=1))
# [ True False False]
# get the indices of the rows for which the above statement is true
print(np.where(np.all(A[:, :2] == np.array([0, 1]), axis=1))[0])
# [0]
The answer by schore seems good, but since you asked for a fast way, and for loops get quite slow for large arrays I assumed you were using numpy since your question is tagged with it, I had a try at fancy indexing.
A = np.array([[0,1,2], [0,2,3], [1,5,6]])
indices = np.array(range(len(A)))
check01 = A[:,:2] ==[0,1]
result01 = (check01)[:,0] * (check01)[:,1]
print(indices[result01])
result0 = (A == 0)[:,0]
print(indices[result0])
this gives me
[0]
[0 1]
I am by no means an expert in writing fast code and there may be even nicer ways. Try to print out the things I define on the way to get a better understanding.
I have two 2D arrays, e.g.,
A = [[1,0],[2,0],[3,0],[4,0]]
B = [[2,0.3],[4,0.1]]
Although the arrays are much larger, with A about 10x the size of B, and about 100,000 rows in A. I want to replace rows in A with the row in B whenever the 1st elements of the rows match, and leave the other rows in A unchanged. In the above example, I want to end up with:
[[1,0],[2,0.3],[3,0],[4,0.1]]
How do I do this, preferably efficiently?
We will have to iterate through the entire array A once in any case, since we are transforming it. What we could speed up though, is the look-up if a particular first element of A exists in B. To that end, it would be efficient to create a dictionary out of B. That way, lookup will be constant time. I am assuming here that the first element of A matches to only one element of B.
Transforming B to a dict can be done this way:
transformed_B = { item[0]: item[1] for item in B}
Replacing the elements in A could then be done with:
transformed_A = [[item[0], transformed_B[item[0]]] if item[0] in transformed_B else item for item in A]
Another option is to sort the smallest array and use binary search to find matching values. You can do this in a vectorized manner as follows:
a = np.zeros((1000, 2))
b = np.zeros((100, 2))
a[:, 0] =np.random.randint(200, size=(1000,))
b[:, 0] = np.random.choice(np.arange(100), size=(100,), replace=False)
b[: ,1] = np.random.rand(100)
# sort and binary search
b_sort = b[np.argsort(b[:, 0])]
idx = np.searchsorted(b_sort[:, 0], a[:, 0])
# don't look at indices larger than largest possible in b_sort
mask = idx < b.shape[0]
# check whether the value at the returned index really is the same
mask[mask] &= b_sort[idx[mask], 0] == a[:, 0][mask]
# copy the second column for positions fulfilling both conditions
a[:, 1][mask] = b_sort[idx[mask] ,1]
# only values < 100 should have a second column != 0
>>> a
array([[ 7.40000000e+01, 5.38114946e-01],
[ 8.80000000e+01, 9.21309165e-01],
[ 8.60000000e+01, 1.86336715e-01],
...,
[ 1.88000000e+02, 0.00000000e+00],
[ 5.00000000e+00, 3.81152557e-01],
[ 1.38000000e+02, 0.00000000e+00]]
)