How to count occurrence in two dimension array in Python? - python

Code Start
from collections import Counter
import numpy as np
List = [[7,12,17,26,29,31],\
[4,9,11,17,26,27],\
[5,6,8,21,31,33],\
[3,17,21,23,27,28],\
[4,10,18,19,25,27],\
[5,8,13,19,27,28],\
[15,16,21,22,27,33],\
[11,12,13,14,18,33],\
[2,8,10,18,20,33],\
[2,7,10,20,27,29],\
]
for i in List:
print(i, List.count(i), (List.count(i)/len(List)))
Code End
Result
[7, 12, 17, 26, 29, 31] 1 0.1
[4, 9, 11, 17, 26, 27] 1 0.1
[5, 6, 8, 21, 31, 33] 1 0.1
[3, 17, 21, 23, 27, 28] 1 0.1
[4, 10, 18, 19, 25, 27] 1 0.1
[5, 8, 13, 19, 27, 28] 1 0.1
[15, 16, 21, 22, 27, 33] 1 0.1
[11, 12, 13, 14, 18, 33] 1 0.1
[2, 8, 10, 18, 20, 33] 1 0.1
[2, 7, 10, 20, 27, 29] 1 0.1
Question
How can I get the result like this? Count occurrence of every element by a single line.
2 2 0.2
3 1 0.1
4 2 0.2
5 1 0.1
...
33 4 0.4
I tried many different ways but always receive the same result.
As I am a newbe to Python, I hope someone can help me figure this out.
BTW, if there is any book explain openpyxl, list manipulation and computational science clearly, please help to recommend.
Thank you very much in advance.

What you are doing right now is counting the occurences of each line (as a list) within the numpy array. Since they all appear only once you get a count of 1.
First we have to get all the numbers that appear in the array:
unique = np.unique(List)
Then we can loop over the rows and count how often they appear:
counts = {u:0 for u in unique}
List = np.asarray(List)
for i in unique:
for row in List:
if i in row:
counts[i]+=1
Lastly, if you want to print the results:
for k,v in counts.items():
print(k,v,v/len(List))

Related

Find the nearest number(but not exceed) in the list and change numbers in the series(Python)

there is a series like below
s = pd.Series([25, 33, 39])
0 25
1 33
2 39
dtype: int64
and a list
list = [5, 10, 20, 26, 30, 31, 32, 35, 40]
[5, 10, 20, 26, 30, 31, 32, 35, 40]
I'd like to find the nearest number in the list and **change the number **in the series
for example
first number is the series is 25
but the list is [5, 10, 20, 26, 30, 31, 32, 35, 40]
so the firtst nearest number(corresponding to 25 in the series)
is 20 (Actually 26 is nearest number, but I need a number less than 25)
and then the second number is 31, thrid is 35
after finding the number and change that in the series
desired out s is
0 20
1 31
2 35
please give me advice. It's a very important task for me.
if possilbe? without for loop plz
Find the nearest number(but not exceed) in the list and change numbers in the series(Python)
You are looking for merge_asof:
s = pd.Series([25, 33, 39], name="s")
l = pd.Series([5, 10, 20, 26, 30, 31, 32, 35, 40], name="l")
pd.merge_asof(s, l, left_on="s", right_on="l")
A few notes:
There is a bug in your expected output. The closest number to 33 is 32.
Don't name your variable list. It overwrites the name of a very common Python class.
Make sure l is sorted.

Sort for Matrix

I have a problem with matrix sort.
I need to create a matrix (MxM) from input. And create nested lists using randrange.
matrix_size = int(input("Enter size of the matrix: "))
matrix = [[randrange(1, 51) for column in range(matrix_size)] for row in range(matrix_size)]
Next step i should find sum of each column of matrix. So i do this thing:
for i in range(matrix_size):
sum_column = 0
for j in range(matrix_size):
sum_column += matrix[j][i]
print(f'{matrix[i][j]:>5}', end='')
print(f'{sum_column:>5}')
So problem is... that i should add sum row in the end of a matrix. But what happens to me:
Enter the size of the matrix: 5
15 23 14 22 20 73
7 26 26 27 27 160
17 36 9 13 42 104
1 32 41 2 29 113
33 43 14 49 12 130
Yeah. It counting right but how i can add it to the end of matrix. And sort ascending to the sums of columns. Hope some of you will understand what i need. Thanks
Do you mean something like this?
import numpy as np
matrix = np.array(matrix)
rowsum = matrix.sum(axis=1) # sum of rows
idx = np.argsort(rowsum) # permutation that makes rowsum sorted
result = np.hstack([matrix, rowsum[:, None]]) # join matrix and roswum
result = result[idx] # sort rows in ascending order
for matrix
array([[31, 13, 29, 5, 1],
[21, 9, 34, 31, 22],
[13, 38, 29, 20, 50],
[21, 12, 26, 5, 15],
[19, 24, 38, 44, 41]])
would the output be:
array([[ 31, 13, 29, 5, 1, 79],
[ 21, 12, 26, 5, 15, 79],
[ 21, 9, 34, 31, 22, 117],
[ 13, 38, 29, 20, 50, 150],
[ 19, 24, 38, 44, 41, 166]])

Python FloorOfX Using Binary Search

How do I write the code by using binary search:
def floorofx(L, x):
pass
Like define low, high, middle for each of element. As,
Input: L = [11, 12, 13, 14, 15, 20, 27, 28], x = 17
Output: 15
15 is the largest element in L smaller than 17.
Input: L = [11, 12, 13, 14, 15, 16, 19], x = 20
Output: 19
19 is the largest element in L smaller than 20.
Input: L = [1, 2, 8, 10, 10, 12, 19], x = 0
Output: -1
Since floor doesn't exist, output is -1.
Can someone help me with it?
You can simply use the bisect module and decrement the obtained index.
from bisect import bisect_right
def floorofx(L, x):
idx = bisect_right(L,x)
return L[idx-1] if idx > 0 else -1
This generates the following results (for your given sample input):
>>> floorofx(L = [11, 12, 13, 14, 15, 20, 27, 28], x = 17)
15
>>> floorofx(L = [11, 12, 13, 14, 15, 16, 19], x = 20)
19
>>> floorofx(L = [1, 2, 8, 10, 10, 12, 19], x = 0)
-1
Mind that L must be sorted, and that in case -1 is an element of L, you cannot make the distinction between "not found" and -1 as a result. Since we use binary search, the algorithm runs in O(log n).

Pandas repeated values

Is there a more idiomatic way of doing this in Pandas?
I want to set-up a column that repeats the integers 1 to 48, for an index of length 2000:
df = pd.DataFrame(np.zeros((2000, 1)), columns=['HH'])
h = 1
for i in range(0,2000) :
df.loc[i,'HH'] = h
if h >=48 : h =1
else : h += 1
Here is more direct and faster way:
pd.DataFrame(np.tile(np.arange(1, 49), 2000 // 48 + 1)[:2000], columns=['HH'])
The detailed step:
np.arange(1, 49) creates an array from 1 to 48 (included)
>>> l = np.arange(1, 49)
>>> l
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48])
np.tile(A, N) repeats the array A N times, so in this case you get [1 2 3 ... 48 1 2 3 ... 48 ... 1 2 3 ... 48]. You should repeat the array 2000 // 48 + 1 times in order to get at least 2000 values.
>>> r = np.tile(l, 2000 // 48 + 1)
>>> r
array([ 1, 2, 3, ..., 46, 47, 48])
>>> r.shape # The array is slightly larger than 2000
(2016,)
[:2000] retrieves the 2000 first values from the generated array to create your DataFrame.
>>> d = pd.DataFrame(r[:2000], columns=['HH'])
df = pd.DataFrame({'HH':np.append(np.tile(range(1,49),int(2000/48)), range(1,np.mod(2000,48)+1))})
That is, appending 2 arrays:
(1) np.tile(range(1,49),int(2000/48))
len(np.tile(range(1,49),int(2000/48)))
1968
(2) range(1,np.mod(2000,48)+1)
len(range(1,np.mod(2000,48)+1))
32
And constructing the DataFrame from a corresponding dictionary.

extracting numbers from list

I've created a list (which is sorted):
indexlist = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33]
I want to extract the numbers from this list that are at least five away from each other and input them into another list. This is kind of confusing. This is an example of how I want the output:
outlist = [0, 7, 19, 25, 31]
As you can see, none of the numbers are within 5 of each other.
I've tried this method:
for index2 in range(0, len(indexlist) - 1):
if indexlist[index2 + 1] > indexlist[index2] + 5:
outlist.append(indexlist[index2])
However, this gives me this output:
outlist = [0, 12, 19]
Sure, the numbers are at least 5 away, however, I'm missing some needed values.
Any ideas about how I can accomplish this task?
You need to keep track of the last item you added to the list, not just compare to the following value:
In [1]: indexlist = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33]
In [2]: last = -1000 # starting value hopefully low enough :)
In [3]: resultlist = []
In [4]: for item in indexlist:
...: if item > last+5:
...: resultlist.append(item)
...: last = item
...:
In [5]: resultlist
Out[5]: [0, 7, 19, 25, 31]
This should do the trick. Here, as I said in comment, the outlist is initialised with the first value of indexlistand iterated indexlist elements are compared to it. It is a rough solution. But works.
indexlist = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33]
outlist = [indexlist[0]]
for index2 in range(1, len(indexlist) - 1):
if indexlist[index2] > (outlist[-1] + 5):
outlist.append(indexlist[index2])
output:
>>outlist
[0, 7, 19, 25, 31]
Tim Pietzcker's answer is right but this can also be done without storing the last added item in a separate variable. Instead you can read the last value in outlist:
>>> indexlist = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33]
>>> outlist = []
>>> for n in indexlist:
... if not outlist or n > outlist[-1] + 5:
... outlist.append(n)
...
>>> outlist
[0, 7, 19, 25, 31]
I suppose your index_list is sorted. Then this will give you only indexes MIN_INDEX_OFFSET apart.
MIN_INDEX_OFFSET = 5;
index_list = [0, 7, 8, 12, 19, 25, 26, 27, 29, 30, 31, 33];
last_accepted = index_list[0];
out_list = [last_accepted];
for index in index_list:
if index-last_accepted > MIN_INDEX_OFFSET:
out_list.append(index);
last_accepted = index;
print(out_list)

Categories