I am trying to count labels in a nested list and would appreciate any help on how to go about it. An example of the array I am trying to count is given below. This is not the real array, but a good enough approximation:
x = [[[0,0,1,0,2,3],-1],[[-1,1,1,0,2,5],-1],[[0,0,1,0,2,1],-1],[[0,0,-1,0,2,3],0]]
What I would like is to count all the occurrences of a given integer in the second element of the middle list (to visualize better, I would like to count all occurrences of X in the list like this [[[],X]]).
For instance, counting -1 in the above array would get 3 as a result and not 5. I do not want to get into loops and counters and such naive computations because the arrays I am working with are fairly large. Are there any standard python libraries that deal with such cases?
One approach:
data = [[[0, 0, 1, 0, 2, 3], -1], [[-1, 1, 1, 0, 2, 5], -1], [[0, 0, 1, 0, 2, 1], -1], [[0, 0, -1, 0, 2, 3], 0]]
res = sum(1 for _, y in data if y == -1)
print(res)
Output
3
Alternative, use collections.Counter, if you need to count more than a single element.
res = Counter(y for _, y in data)
print(res)
Output
Counter({-1: 3, 0: 1})
A third alternative is use operator.countOf:
from operator import countOf
res = countOf((y for _, y in data), -1)
print(res)
Output
3
You can use collections.Counter:
from collections import Counter
c = Counter((i[1] for i in x))
c[-1]
output:
>>> c[-1]
3
>>> c
Counter({-1: 3, 0: 1})
x = [
[ [0,0,1,0,2,3], -1],
[ [-1, 1, 1, 0, 2, 5], -1],
[ [0, 0, 1, 0, 2, 1], -1],
[ [0, 0, -1, 0, 2, 3], 0]
]
These are the items of the list [ [....], x] with another list and an integer.
You want to count how many times the x is -1 and not 0.
x = [
[ [0,0,1,0,2,3], -1],
[ [-1, 1, 1, 0, 2, 5], -1],
[ [0, 0, 1, 0, 2, 1], -1],
[ [0, 0, -1, 0, 2, 3], 0]
]
def counter(_list, element):
count = 0
for item in _list:
if item[1] == element:
count += 1
return count
print(counter(x, -1))
>>> 3
Related
I would like to know the fastest way to extract the indices of the first n non zero values per column in a 2D array.
For example, with the following array:
arr = [
[4, 0, 0, 0],
[0, 0, 0, 0],
[0, 4, 0, 0],
[2, 0, 9, 0],
[6, 0, 0, 0],
[0, 7, 0, 0],
[3, 0, 0, 0],
[1, 2, 0, 0],
With n=2 I would have [0, 0, 1, 1, 2] as xs and [0, 3, 2, 5, 3] as ys. 2 values in the first and second columns and 1 in the third.
Here is how it is currently done:
x = []
y = []
n = 3
for i, c in enumerate(arr.T):
a = c.nonzero()[0][:n]
if len(a):
x.extend([i]*len(a))
y.extend(a)
In practice I have arrays of size (405, 256).
Is there a way to make it faster?
Here is a method, although quite confusing as it uses a lot of functions, that does not require sorting the array (only a linear scan is necessary to get non null values):
n = 2
# Get indices with non null values, columns indices first
nnull = np.stack(np.where(arr.T != 0))
# split indices by unique value of column
cols_ids= np.array_split(range(len(nnull[0])), np.where(np.diff(nnull[0]) > 0)[0] +1 )
# Take n in each (max) and concatenate the whole
np.concatenate([nnull[:, u[:n]] for u in cols_ids], axis = 1)
outputs:
array([[0, 0, 1, 1, 2],
[0, 3, 2, 5, 3]], dtype=int64)
Here is one approach using argsort, it gives a different order though:
n = 2
m = arr!=0
# non-zero values first
idx = np.argsort(~m, axis=0)
# get first 2 and ensure non-zero
m2 = np.take_along_axis(m, idx, axis=0)[:n]
y,x = np.where(m2)
# slice
x, idx[y,x]
# (array([0, 1, 2, 0, 1]), array([0, 2, 3, 3, 5]))
Use dislocation comparison for the row results of the transposed nonzero:
>>> n = 2
>>> i, j = arr.T.nonzero()
>>> mask = np.concatenate([[True] * n, i[n:] != i[:-n]])
>>> i[mask], j[mask]
(array([0, 0, 1, 1, 2], dtype=int64), array([0, 3, 2, 5, 3], dtype=int64))
mat = [[ 1. 2. 3. 4. 5.]
[ 6. 7. 8. 9. 10.]
[11. 12. 13. 14. 15.]]
Suppose, I have this NumPy array.
Say, I need to extract the 2nd column of each row, convert them into binary, and then create a vector out of them.
How can I do it using NumPy?
For instance, if I select 2nd column of this NumPy array, my output should look as follows:
[[0 0 1 0],
[0 1 1 1],
[1 1 0 0]]
I tried as follows:
my_data = np.genfromtxt('data_input')
print(my_data)
my_data_2nd_column = my_data[:, 1]
my_data_2nd_column_binary = Utils.encode(my_data_2nd_column)
my_2nd_column_binary = np.apply_along_axis(Utils.encode, 1, my_data)
print(my_2nd_column_binary)
Numpy has a built-in function for this. First, you can get a particular column using indexing:
>>> arr
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
>>> arr[:, [1]]
array([[ 2],
[ 7],
[12]])
Then, you could use the built-in function, but make sure you convert to unsigned, 8-bit integers:
>>> np.unpackbits(arr[:, [1]].astype(np.uint8), axis=1)
array([[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 0, 0]], dtype=uint8)
Of course, if you need the second dimension to be rank 4, just use slicing again, although, it is probably worth copying if you are going to do lots of operations on the resulting array:
>>> np.unpackbits(arr[:, [1]].astype(np.uint8), axis=1)[:, -4:]
array([[0, 0, 1, 0],
[0, 1, 1, 1],
[1, 1, 0, 0]], dtype=uint8)
I had done this example, but without using the numpy library.
I commented on all functions.
mat = [[ 1, 2, 3, 4, 1,],
[ 6, 7, 8, 9, 40,],
[11, 12, 13, 14, 15,]]
# convert the binary into a vector of elements
def split(word):
return [int(char) for char in word]
# returns the vector size of the largest binary
def binaryBig(lista):
maior = max(lista, key=int)
temp = "{0:b}".format(maior)
return len(split(temp))
# convert the element to binary
def binary(x,big):
temp = split(format(x, "b"))
for n in range(len(temp),big):
temp.insert(0,0)
return temp
# create the matrix with the binaries
def createBinaryMat(lista):
big = binaryBig(lista)
mat = []
for i in lista:
mat.append(binary(i,big))
return mat
# select the column and return the created matrix
def binaryElementsOfColum(colum,mat):
lista = []
for i in mat:
lista.append(i[colum])
return createBinaryMat(lista)
for i in binaryElementsOfColum(4,mat):
print(i)
Output:
[0, 0, 0, 0, 0, 1]
[1, 0, 1, 0, 0, 0]
[0, 0, 1, 1, 1, 1]
I have a list:
hash_table = [1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
I want to change this to:
result = [[0, 0], [1, 2], [4, 5]]
How to generate:
array: [1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
map: 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
# start to end, generate the result like `[int(start), int(end)]`
combine them:[[0, 0], [1, 2], [4, 5]]
0 and 1 wouldn't appear in pairs. So the numbers in result must be an integer.
What I have tried:
hash_table = [1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
output = [[]]
for pre, next_ in zip(hash_table, hash_table[1:]):
output[-1].append(pre)
if {next_, pre} == {0, 1}:
output.append([])
output[-1].append(hash_table[-1])
# the output is [[1], [0], [1, 1, 1], [0, 0, 0], [1, 1, 1]]
start = index = 0
result = []
while index < len(output):
# output[index]
if output[0] != 0:
res.append([start, math.ceil(len(output[index]))])
# I don't know how to handle the list "output".
# I couldn't know it. My mind has gone blank
start += len(output[index])/2
Any good ideas? I thought I made it too complicated.
You can use itertools.groupby to group the 0s and 1s:
import itertools
hash_table = [1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
result = []
cur_ind = 0
for (val, vals) in itertools.groupby(hash_table):
vals = list(vals) # itertools doesn't make it a list by default
old_ind = cur_ind
cur_ind += len(vals)
if val == 0:
continue
result.append([old_ind // 2, (cur_ind - 1) // 2])
print(result)
Essentially, itertools.groupby will give an iterator of [(1, [1]), (0, [0]), (1, [1, 1, 1]), (0, [0, 0, 0]), (1, [1, 1, 1])] (more or less). We can iterate through this iterator and keep track if the index we're on by adding the length of the sublist to the current index. If the value is 1, then we have a run of ones so we append it to the results. The old_ind // 2 is integer division and is equivalent to int(old_ind / 2).
You could use groupby from itertools library:
import itertools
hash_table = [1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1]
s = "".join(map(str, hash_table)) # s = "10111000111"
gs = [(i, list(g)) for i, g in itertools.groupby(s)]
idx, result = 0, []
for i, g in gs: # i can be '1' or '0' (i.e, if the group consist in 1's or 0's)
if i == '1':
result.append([idx/2, (idx + len(g) - 1)/2])
idx += len(g)
return result
I am implementing a homomorphic encryption algorithm, and need to convert matrix like this
[[3 1 3]
[3 2 3]
[0 1 0]]
which splits a vector of integers ≤q into a log(q,2) longer vector of the bits of the integers,like:
[[0 1 1 0 0 1 0 1 1]
[0 1 1 0 1 0 0 1 1]
[0 0 0 0 0 1 0 0 0]]
Then it can be calculated as a normal matrix, and the final result can be converted from the binary to integer form.
I used some algorithms in numpy that convert matrix elements to binary, but I didn't achieve what I wanted.
You can do it with np.unpackbits.
>>> matrix = np.array([3,1,3,3,2,3,0,1,0],'uint8').reshape(3,-1)
>>> matrix
array([[3, 1, 3],
[3, 2, 3],
[0, 1, 0]], dtype=uint8)
>>> np.unpackbits(matrix.reshape(3,-1,1),2)[:,:,-3:].reshape(3,-1)
array([[0, 1, 1, 0, 0, 1, 0, 1, 1],
[0, 1, 1, 0, 1, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 0]], dtype=uint8)
Unpackbits will unpack your ints into 8 bits, but since you only seem to be interested in the 3 least significant bits, we unpack into a new axis, and use slicing [:,:,-3:] to strip out the padding zeros.
Here is one way of doing it:
import itertools
def expand_to_binary(my_list, q):
my_list = [list(('{0:0' + str(q) + 'b}').format(elem)) for elem in my_list]
my_list = [list(map(int, elem)) for elem in my_list]
my_list = list(itertools.chain(*my_list))
return my_list
x = [[3, 1, 3], [3, 2, 3], [0, 1, 0]]
x = [expand_to_binary(elem, 3) for elem in x]
q is the number of bits in every binary number. Although this is only the forward pass. Implementing the reverse part shouldn't be too difficult.
And this would be one way of implementing the reverse:
def decode_binary_to_int(my_list, q):
my_list = [list(map(str, my_list[i: i+q])) for i in range(0, len(my_list), q)]
my_list = [''.join(elem) for elem in my_list]
my_list = [int(elem, 2) for elem in my_list]
return my_list
x = [[0, 1, 1, 0, 0, 1, 0, 1, 1], [0, 1, 1, 0, 1, 0, 0, 1, 1], [0, 0, 0, 0, 0, 1, 0, 0, 0]]
x = [decode_binary_to_int[elem] for elem in x]
Although this code works i should say it's probably not the most fastest way of implementing what you want, i just tried to provide an example for what you required.
I have a 2D numpy array with about 12 columns and 1000+ rows and each cell contains a number from 1 to 5. I'm searching for the best sextuple of columns according to my point system where 1 and 2 generate -1 point and 4 and 5 gives +1.
If a row in a certain sextuple contains, for example, [1, 4, 5, 3, 4, 3] the point for this row should be +2, because 3*1 + 1*(-1) = 2. Next row may be [1, 2, 2, 3, 3, 3] and should be -3 points.
At first, I tried a strait forward loop solution but I realized there are 665 280 possible combinations of columns to compare and when I also need to search for the best quintuple, quadruple etc. the loop is taking forever.
Is there perhaps a smarter numpy-way of solving my problem?
import numpy as np
import itertools
N_rows = 10
arr = np.random.random_integers(5, size=(N_rows,12))
x = np.array([0,-1,-1,0,1,1])
y = x[arr]
print(y)
score, best_sextuple = max((y[:,cols].sum(), cols)
for cols in itertools.combinations(range(12),6))
print('''\
score: {s}
sextuple: {c}
'''.format(s = score, c = best_sextuple))
yields, for example,
score: 6
sextuple: (0, 1, 5, 8, 10, 11)
Explanation:
First, let's generate a random example, with 12 columns and 10 rows:
N_rows = 10
arr = np.random.random_integers(5, size=(N_rows,12))
Now we can use numpy indexing to convert the numbers in arr 1,2,...,5 to the values -1,0,1 (according to your scoring system):
x = np.array([0,-1,-1,0,1,1])
y = x[arr]
Next, let's use itertools.combinations to generate all possible combinations of 6 columns:
for cols in itertools.combinations(range(12),6)
and
y[:,cols].sum()
then gives the score for cols, a choice of columns (sextuple).
Finally, use max to pick off the sextuple with the best score:
score, best_sextuple = max((y[:,cols].sum(), cols)
for cols in itertools.combinations(range(12),6))
import numpy
A = numpy.random.randint(1, 6, size=(1000, 12))
points = -1*(A == 1) + -1*(A == 2) + 1*(A == 4) + 1*(A == 5)
columnsums = numpy.sum(points, 0)
def best6(row):
return numpy.argsort(row)[-6:]
bestcolumns = best6(columnsums)
allbestcolumns = map(best6, points)
bestcolumns will now contain the best 6 columns in ascending order. By similar logic, allbestcolumns will contain the best six columns in each row.
Extending on unutbu's longer answer above, it's possible to generate the masked array of scores automatically. Since your scores for values are consistent every pass through the loop, so the scores for each value only need to be calculated once. Here's slightly inelegant way to do it on an example 6x10 array, before and after your scores are applied.
>>> import numpy
>>> values = numpy.random.randint(6, size=(6,10))
>>> values
array([[4, 5, 1, 2, 1, 4, 0, 1, 0, 4],
[2, 5, 2, 2, 3, 1, 3, 5, 3, 1],
[3, 3, 5, 4, 2, 1, 4, 0, 0, 1],
[2, 4, 0, 0, 4, 1, 4, 0, 1, 0],
[0, 4, 1, 2, 0, 3, 3, 5, 0, 1],
[2, 3, 3, 4, 0, 1, 1, 1, 3, 2]])
>>> b = values.copy()
>>> b[ b<3 ] = -1
>>> b[ b==3 ] = 0
>>> b[ b>3 ] = 1
>>> b
array([[ 1, 1, -1, -1, -1, 1, -1, -1, -1, 1],
[-1, 1, -1, -1, 0, -1, 0, 1, 0, -1],
[ 0, 0, 1, 1, -1, -1, 1, -1, -1, -1],
[-1, 1, -1, -1, 1, -1, 1, -1, -1, -1],
[-1, 1, -1, -1, -1, 0, 0, 1, -1, -1],
[-1, 0, 0, 1, -1, -1, -1, -1, 0, -1]])
Incidentally, this thread claims that creating the combinations directly within numpy will yield around 5x faster performance than itertools, though perhaps at the expense of some readability.