Frequency of numbers in an array - python

I want to get the frequency of numbers in an unsorted array. I am getting the frequency of numbers, but the output shows the frequency of a particular number multiple times. I want the resulting frequency to be shown only once.
A = [2,5,1,2,4,6,3,10,3,4,3,2,3,2,15]
B = max(A) + 1
F =[None] * B
for i in range(0,B):
F[i] = 0
for j in range(0,len(A)):
F[A[j]] = F[A[j]] + 1
for k in range(0,len(A)):
if F[A[k]] != 0:
print("Frequency of ", A[k] , " is : " , F[A[k]])
Output obtained showing frequency of say 2, four times.
Frequency of 2 is : 4
Frequency of 5 is : 1
Frequency of 1 is : 1
Frequency of 2 is : 4
Frequency of 4 is : 2
Frequency of 6 is : 1
Frequency of 3 is : 4
Frequency of 10 is : 1
Frequency of 3 is : 4
Frequency of 4 is : 2
Frequency of 3 is : 4
Frequency of 2 is : 4
Frequency of 3 is : 4
Frequency of 2 is : 4
Frequency of 15 is : 1

Use collections.Counter for this
In [1]: from collections import Counter
In [2]: A = [2,5,1,2,4,6,3,10,3,4,3,2,3,2,15]
In [3]: for k, v in Counter(A).items():
...: print('Frequency of {} is {}'.format(k, v))
...:
Frequency of 2 is 4
Frequency of 5 is 1 ...

You can use a dict data structure for that. See the well commented code within:
# This function creates the collection frequencies
def get_collection_frequency(mylist):
# Dictionary data structure is used
mydict = {}
# Loop through the input list
for index in mylist:
# If the item is already there
if (index in mydict):
# Increase its frequency
mydict[index] += 1
# If it is not
else:
# Set its frequency equal to 1
mydict[index] = 1
# Return the dictionary
return mydict
A = [2,5,1,2,4,6,3,10,3,4,3,2,3,2,15]
new = get_collection_frequency(A)
print(new)
Returns: {2: 4, 5: 1, 1: 1, 4: 2, 6: 1, 3: 4, 10: 1, 15: 1}

get the set of the list to remove multiple occurrences, then just loop through:
for num in set(A):
print("Frequency of {} is {}".format(num,A.count(num)))
output:
Frequency of 1 is 1
Frequency of 2 is 4
Frequency of 3 is 4
Frequency of 4 is 2
Frequency of 5 is 1
Frequency of 6 is 1
Frequency of 10 is 1
Frequency of 15 is 1

Related

grouping a list values based on max value

I'm working on k-mean algorthim to cluster list of number, If i have an array (X)
X=array([[0.85142858],[0.85566274],[0.85364912],[0.81536489],[0.84929932],[0.85042336],[0.84899714],[0.82019115], [0.86112067],[0.8312496 ]])
then I run the following code
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
cluster.fit_predict(X)
for i in range(len(X)):
print("%4d " % cluster.labels_[i], end=""); print(X[i])
i got the results
1 1 [0.85142858]
2 3 [0.85566274]
3 3 [0.85364912]
4 0 [0.81536489]
5 1 [0.84929932]
6 1 [0.85042336]
7 1 [0.84899714]
8 0 [0.82019115]
9 4 [0.86112067]
10 2 [0.8312496]
how to get the max number in each cluster with value of (i) ? like this
0: 0.82019115 8
1: 0.85142858 1
2: 0.8312496 10
3: 0.85566274 2
4: 0.86112067 9
First group them together as pair using zip then sort it by values(second element of pair) in increasing order and create a dict out of it.
Try:
res = list(zip(cluster.labels_, X))
max_num = dict(sorted(res, key=lambda x: x[1], reverse=False))
max_num:
{0: array([0.82019115]),
2: array([0.8312496]),
1: array([0.85142858]),
3: array([0.85566274]),
4: array([0.86112067])}
Edit:
Do you want this?
elem = list(zip(res, range(1,len(X)+1)))
e = sorted(elem, key=lambda x: x[0][1], reverse=False)
final_dict = {k[0]:(k[1], v) for (k,v) in e}
for key in sorted(final_dict):
print(f"{key}: {final_dict[key][0][0]} {final_dict[key][1]}")
0: 0.82019115 8
1: 0.85142858 1
2: 0.8312496 10
3: 0.85566274 2
4: 0.86112067 9
OR
import pandas as pd
df = pd.DataFrame(zip(cluster.labels_,X))
df[1] = df[1].str[0]
df = df.sort_values(1).drop_duplicates([0],keep='last')
df.index = df.index+1
df = df.sort_values(0)
df:
0 1
8 0 0.820191
1 1 0.851429
10 2 0.831250
2 3 0.855663
9 4 0.861121

Make a frequency table with categories in Python

I am trying to make an easy frequency table in Python, but I can't find the answer. My data contains numbers from 0 to 10, for example:
1,2,3,4,5,5,5,8,8,8,0,9,10,2,2,10,10,7,7,7,7,9.
I want to make a frequency table with the counts and percentiles (zero excluded!) of these values turned into 3 categories:
Category 1 : lower than 5,5
Category 2 : Between 5,5 and 8
Category 3 : 8 or higher
My output then needs to be:
Category 1 : frequency 9/ 43%
Category 2 : frequency 4/19%
Category 3 : frequency 8/38%
How do I do this in Python?
Updated version that will work for your use-case:
dd = {"cat_1":0, "cat_2":0, "cat_3":0}
values = [1,2,3,4,5,5,5,8,8,8,0,9,10,2,2,10,10,7,7,7,7,9]
for value in values:
if value > 0 and value < 5.5:
dd["cat_1"] += 1
elif value >= 5.5 and value < 8:
dd["cat_2"] += 1
elif value >= 8:
dd["cat_3"] += 1
print(f"Category 1 : frequency {dd['cat_1']}/{(dd['cat_1']/(len(values)-values.count(0)))*100}")
print(f"Category 2 : frequency {dd['cat_2']}/{(dd['cat_2']/(len(values)-values.count(0)))*100}")
print(f"Category 3 : frequency {dd['cat_3']}/{(dd['cat_3']/(len(values)-values.count(0)))*100}")

Compare current column value to different column value by row slices

Assuming a dataframe like this
In [5]: data = pd.DataFrame([[9,4],[5,4],[1,3],[26,7]])
In [6]: data
Out[6]:
0 1
0 9 4
1 5 4
2 1 3
3 26 7
I want to count how many times the values in a rolling window/slice of 2 on column 0 are greater or equal to the value in col 1 (4).
On the first number 4 at col 1, a slice of 2 on column 0 yields 5 and 1, so the output would be 2 since both numbers are greater than 4, then on the second 4 the next slice values on col 0 would be 1 and 26, so the output would be 1 because only 26 is greater than 4 but not 1. I can't use rolling window since iterating through rolling window values is not implemented.
I need something like a slice of the previous n rows and then I can iterate, compare and count how many times any of the values in that slice are above the current row.
I have done this using list instead of doing it in data frame. Check the code below:
list1, list2 = df['0'].values.tolist(), df['1'].values.tolist()
outList = []
for ix in range(len(list1)):
if ix < len(list1) - 2:
if list2[ix] < list1[ix + 1] and list2[ix] < list1[ix + 2]:
outList.append(2)
elif list2[ix] < list1[ix + 1] or list2[ix] < list1[ix + 2]:
outList.append(1)
else:
outList.append(0)
else:
outList.append(0)
df['2_rows_forward_moving_tag'] = pd.Series(outList)
Output:
0 1 2_rows_forward_moving_tag
0 9 4 1
1 5 4 1
2 1 3 0
3 26 7 0

Group by a range of numbers Python

I have a list of numbers in a python data frame and want to group these numbers by a specific range and count. The numbers range from 0 to 20 but lets say there might not be any number 6 in that case I want it to show 0.
dataframe column looks like
|points|
5
1
7
3
2
2
1
18
15
4
5
I want it to look like the below
range | count
1 2
2 2
3 1
4 1
5 2
6 0
7 ...
8
9...
I would iterate through the input lines and fill up a dict with the values.
All you have to do then is count...
import collections
#read your input and store the numbers in a list
lines = []
with open('input.txt') as f:
lines = [int(line.rstrip()) for line in f]
#pre fill the dictionary with 0s from 0 to the highest occurring number in your input.
values = {}
for i in range(max(lines)+1):
values[i] = 0
# increment the occurrence by 1 for any found value
for val in lines:
values[val] += 1
# Order the dict:
values = collections.OrderedDict(sorted(values.items()))
print("range\t|\tcount")
for k in values:
print(str(k) + "\t\t\t" + str(values[k]))
repl: https://repl.it/repls/DesertedDeafeningCgibin
Edit:
a slightly more elegant version using dict comprehension:
# read input as in the first example
values = {i : 0 for i in range(max(lines)+1)}
for val in lines:
values[val] += 1
# order and print as in the first example

How to count the frequency of numbers given in a text file

How to count the frequency of numbers given in a text file. The text file is as follows.
0
2
0
1
0
1
55
100
100
I want the output as follows
0 3
1 2
2 1
55 1
100 2
I tried this without success
def histogram( A, flAsList=False ):
"""Return histogram of values in array A."""
H = {}
for val in A:
H[val] = H.get(val,0) + 1
if flAsList:
return H.items()
return H
Any better way. Thanks in advance!
Use Counter. It's the best way for this type of problems
from collections import Counter
with open('file.txt', 'r') as fd:
lines = fd.read().split()
counter = Counter(lines)
# sorts items
items = sorted(counter.items(), key=lambda x: int(x[0]))
# prints desired output
for k, repetitions in items:
print k,'\t', repetitions
The output:
0 3
1 2
2 1
55 1
100 2
Use a Counter object for this:
from collections import Counter
c = Counter(A)
Now the c variable will hold a frequency map of each of the values. For instance:
Counter(['a', 'b', 'c', 'a', 'c', 'a'])
=> Counter({'a': 3, 'c': 2, 'b': 1})
Please consider using update:
def histogram( A, flAsList=False ):
"""Return histogram of values in array A."""
H = {}
for val in A:
# H[val] = H.get(val,0) + 1
if H.has_key(val):
H[val] = H[val] + 1
else:
H.update({val : 1})
if flAsList:
return H.items()
return H
Simple approach using a dictionary:
histogram = {}
with open("file","r") as f:
for line in f:
try:
histogram[line.strip()] +=1
except KeyError:
histogram[line.strip()] = 1
for key in sorted(histogram.keys(),key=int):
print key,"\t",histogram[key]
Output:
0 3
1 2
2 1
55 1
100 2
Edit:
To select a specific column you'd want to split the line using split(). For example the sixth field by splitting on a single space:
try:
histogram[line.strip().split(' ')[5]] +=1
except KeyError:
histogram[line.strip().split(' ')[5]] = 1

Categories