Related
I have a two dimensional numpy array x:
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
My goal is to replace all consecutive duplicate numbers with a specific value (lets take -1), but by leaving one occurrence unchanged.
I could do this as follows:
def replace_consecutive_duplicates(x):
consec_dup = np.zeros(x.shape, dtype=bool)
consec_dup[:, 1:] = np.diff(x, axis=1) == 0
x[consec_dup] = -1
return x
# current output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, 5, -1, -1, 3],
# [ 0, 2, -1, -1, -1, 1, -1, 4]])
However, in this case the one occurrence left unchanged is always the first.
My goal is to leave the middle occurrence unchanged.
So given the same x as input, the desired output of function replace_consecutive_duplicates is:
# desired output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, -1, 5, -1, 3],
# [ 0, -1, 2, -1, -1, 1, -1, 4]])
Note that in case consecutive duplicate sequences with an even number of occurrences the middle left value should be unchanged. So the consecutive duplicate sequence [2, 2, 2, 2] in x[1] becomes [-1, 2, -1, -1]
Also note that I'm looking for a vectorized solution for 2D numpy arrays since performance is of absolute importance in my particular use case.
I've already tried looking at things like run length encoding and using np.diff(), but I didn't manage to solve this. Hope you guys can help!
The main problem is that you require the length of the number of consecutives values. This is not easy to get with numpy, but using itertools.groupby we can solve it using the following code.
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
def replace_row(arr: np.ndarray, new_val=-1):
results = []
for val, count in itertools.groupby(arr):
k = len(list(count))
results.extend([new_val] * ((k - 1) // 2))
results.append(val)
results.extend([new_val] * (k // 2))
return np.fromiter(results, arr.dtype)
if __name__ == '__main__':
for idx, row in enumerate(x):
x[idx, :] = replace_row(row)
print(x)
Output:
[[ 1 2 8 4 -1 5 -1 3]
[ 0 -1 2 -1 -1 1 -1 4]]
This isn't vectorized, but can be combined with multi threading since every row is handled one by one.
I am trying to count labels in a nested list and would appreciate any help on how to go about it. An example of the array I am trying to count is given below. This is not the real array, but a good enough approximation:
x = [[[0,0,1,0,2,3],-1],[[-1,1,1,0,2,5],-1],[[0,0,1,0,2,1],-1],[[0,0,-1,0,2,3],0]]
What I would like is to count all the occurrences of a given integer in the second element of the middle list (to visualize better, I would like to count all occurrences of X in the list like this [[[],X]]).
For instance, counting -1 in the above array would get 3 as a result and not 5. I do not want to get into loops and counters and such naive computations because the arrays I am working with are fairly large. Are there any standard python libraries that deal with such cases?
One approach:
data = [[[0, 0, 1, 0, 2, 3], -1], [[-1, 1, 1, 0, 2, 5], -1], [[0, 0, 1, 0, 2, 1], -1], [[0, 0, -1, 0, 2, 3], 0]]
res = sum(1 for _, y in data if y == -1)
print(res)
Output
3
Alternative, use collections.Counter, if you need to count more than a single element.
res = Counter(y for _, y in data)
print(res)
Output
Counter({-1: 3, 0: 1})
A third alternative is use operator.countOf:
from operator import countOf
res = countOf((y for _, y in data), -1)
print(res)
Output
3
You can use collections.Counter:
from collections import Counter
c = Counter((i[1] for i in x))
c[-1]
output:
>>> c[-1]
3
>>> c
Counter({-1: 3, 0: 1})
x = [
[ [0,0,1,0,2,3], -1],
[ [-1, 1, 1, 0, 2, 5], -1],
[ [0, 0, 1, 0, 2, 1], -1],
[ [0, 0, -1, 0, 2, 3], 0]
]
These are the items of the list [ [....], x] with another list and an integer.
You want to count how many times the x is -1 and not 0.
x = [
[ [0,0,1,0,2,3], -1],
[ [-1, 1, 1, 0, 2, 5], -1],
[ [0, 0, 1, 0, 2, 1], -1],
[ [0, 0, -1, 0, 2, 3], 0]
]
def counter(_list, element):
count = 0
for item in _list:
if item[1] == element:
count += 1
return count
print(counter(x, -1))
>>> 3
mat = [[ 1. 2. 3. 4. 5.]
[ 6. 7. 8. 9. 10.]
[11. 12. 13. 14. 15.]]
Suppose, I have this NumPy array.
Say, I need to extract the 2nd column of each row, convert them into binary, and then create a vector out of them.
How can I do it using NumPy?
For instance, if I select 2nd column of this NumPy array, my output should look as follows:
[[0 0 1 0],
[0 1 1 1],
[1 1 0 0]]
I tried as follows:
my_data = np.genfromtxt('data_input')
print(my_data)
my_data_2nd_column = my_data[:, 1]
my_data_2nd_column_binary = Utils.encode(my_data_2nd_column)
my_2nd_column_binary = np.apply_along_axis(Utils.encode, 1, my_data)
print(my_2nd_column_binary)
Numpy has a built-in function for this. First, you can get a particular column using indexing:
>>> arr
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
>>> arr[:, [1]]
array([[ 2],
[ 7],
[12]])
Then, you could use the built-in function, but make sure you convert to unsigned, 8-bit integers:
>>> np.unpackbits(arr[:, [1]].astype(np.uint8), axis=1)
array([[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 0, 0]], dtype=uint8)
Of course, if you need the second dimension to be rank 4, just use slicing again, although, it is probably worth copying if you are going to do lots of operations on the resulting array:
>>> np.unpackbits(arr[:, [1]].astype(np.uint8), axis=1)[:, -4:]
array([[0, 0, 1, 0],
[0, 1, 1, 1],
[1, 1, 0, 0]], dtype=uint8)
I had done this example, but without using the numpy library.
I commented on all functions.
mat = [[ 1, 2, 3, 4, 1,],
[ 6, 7, 8, 9, 40,],
[11, 12, 13, 14, 15,]]
# convert the binary into a vector of elements
def split(word):
return [int(char) for char in word]
# returns the vector size of the largest binary
def binaryBig(lista):
maior = max(lista, key=int)
temp = "{0:b}".format(maior)
return len(split(temp))
# convert the element to binary
def binary(x,big):
temp = split(format(x, "b"))
for n in range(len(temp),big):
temp.insert(0,0)
return temp
# create the matrix with the binaries
def createBinaryMat(lista):
big = binaryBig(lista)
mat = []
for i in lista:
mat.append(binary(i,big))
return mat
# select the column and return the created matrix
def binaryElementsOfColum(colum,mat):
lista = []
for i in mat:
lista.append(i[colum])
return createBinaryMat(lista)
for i in binaryElementsOfColum(4,mat):
print(i)
Output:
[0, 0, 0, 0, 0, 1]
[1, 0, 1, 0, 0, 0]
[0, 0, 1, 1, 1, 1]
I need to get the dot product of many vectors with one vector. Example code:
a = np.array([0, 1, 2])
b = np.array([
[0, 1, 2],
[4, 5, 6],
[-1, 0, 1],
[-3, -2, 1]
])
I would like to get the dot product of each row of b against a. I can iterate:
result = []
for row in b:
result.append(np.dot(row, a))
print(result)
which gives:
[5, 17, 2, 0]
How can I get this without iterating? Thanks!
Use numpy.dot or numpy.matmul without for loop:
import numpy as np
np.matmul(b, a)
# or
np.dot(b, a)
Output:
array([ 5, 17, 2, 0])
I will just do #
b#a
Out[108]: array([ 5, 17, 2, 0])
I have a np.array of shape (15,3).
final_vals = array([[ 37, -84, -143],
[ 29, 2, -2],
[ -18, -2, 0],
[ -3, 6, 0],
[ 361, -5, 2],
[ -23, 4, 8],
[ 0, -1, 0],
[ -1, 1, 0],
[ 62, 181, 83],
[-193, -14, -2],
[ 42, -154, -92],
[ 16, -13, 1],
[ -10, -3, 0],
[-299, 244, 110],
[ 223, -237, -110]])
am trying to find the rows whose element values are between -1 and 1.In the array printed above ROW-6 and ROW-7 are target/result rows.
I tried,
result_idx = np.where(np.logical_and(final_vals>=-1, final_vals<=1))
which returns,
result_idx = (array([ 2, 3, 6, 6, 6, 7, 7, 7, 11, 12], dtype=int64),
array([2, 2, 0, 1, 2, 0, 1, 2, 2, 2], dtype=int64))
I want my program to return only row numbers
You could take the absolute value of all elements, and check which rows's elements are smaller or equal to 1. Then use np.flatnonzero to find the indices where all columns fullfil the condition:
np.flatnonzero((np.abs(final_vals) <= 1).all(axis=1))
Output
array([6, 7], dtype=int64)
Another way to do this based on your approach is to find the truth value of each element and then use numpy.all for each row. Then numpy.where gets you what you want.
mask = (final_vals <= 1) * (final_vals >= -1)
np.where(np.all(mask, axis=1))
How about
np.where(np.all((-1<=final_vals) & (final_vals<=1),axis=1))
You could use np.argwhere:
r = np.logical_and(final_vals <= 1, final_vals >=-1)
result = np.argwhere(r.all(1)).flatten()
print(result)
Output
[6 7]
Another way is using pandas,
you can achieve the row with following code:
df = pd.DataFrame(final_vals)
temp= ((df>=-1) & (df<=1 )).product(axis=1)
rows = temp[temp!=0].keys()
rows
At first it check numbers that are between -1 and +1 and then check rows(with axis=1) that all values accept the condition.
and the result is:
Int64Index([ 6, 7], dtype='int64')
Just a simple list comprehension:
[ i for i, row in enumerate(final_vals) if all([ e >= -1 and e <= 1 for e in row ]) ]
#=> [6, 7]