How to count the frequency of numbers given in a text file - python

How to count the frequency of numbers given in a text file. The text file is as follows.
0
2
0
1
0
1
55
100
100
I want the output as follows
0 3
1 2
2 1
55 1
100 2
I tried this without success
def histogram( A, flAsList=False ):
"""Return histogram of values in array A."""
H = {}
for val in A:
H[val] = H.get(val,0) + 1
if flAsList:
return H.items()
return H
Any better way. Thanks in advance!

Use Counter. It's the best way for this type of problems
from collections import Counter
with open('file.txt', 'r') as fd:
lines = fd.read().split()
counter = Counter(lines)
# sorts items
items = sorted(counter.items(), key=lambda x: int(x[0]))
# prints desired output
for k, repetitions in items:
print k,'\t', repetitions
The output:
0 3
1 2
2 1
55 1
100 2

Use a Counter object for this:
from collections import Counter
c = Counter(A)
Now the c variable will hold a frequency map of each of the values. For instance:
Counter(['a', 'b', 'c', 'a', 'c', 'a'])
=> Counter({'a': 3, 'c': 2, 'b': 1})

Please consider using update:
def histogram( A, flAsList=False ):
"""Return histogram of values in array A."""
H = {}
for val in A:
# H[val] = H.get(val,0) + 1
if H.has_key(val):
H[val] = H[val] + 1
else:
H.update({val : 1})
if flAsList:
return H.items()
return H

Simple approach using a dictionary:
histogram = {}
with open("file","r") as f:
for line in f:
try:
histogram[line.strip()] +=1
except KeyError:
histogram[line.strip()] = 1
for key in sorted(histogram.keys(),key=int):
print key,"\t",histogram[key]
Output:
0 3
1 2
2 1
55 1
100 2
Edit:
To select a specific column you'd want to split the line using split(). For example the sixth field by splitting on a single space:
try:
histogram[line.strip().split(' ')[5]] +=1
except KeyError:
histogram[line.strip().split(' ')[5]] = 1

Related

Create a new column with [0,1] based on match between two rows in Python

i am trying to compare multiple lists or dataframes to one large base dataframe.
Then for any match i want to append a column storing 1 = Match or 0 = No Match
df = pd.DataFrame({'Name':['A','B','C','D'], 'ID' : ['5-6','6-7','8-9','7']})
list1 = ['5-6','8-9']
list2 = ['7','4-3']
As the values i am trying to match include a '-' they are counted as string.
I can generate a list of matching values already, but if i append them, they are all 0
def f(rows):
for i in df['ID']:
for j in list1:
if i == j:
val = 1
else:
val = 0
return val
df['Answer']= df.apply(f,axis=1)
While
for i in df['ID']:
for j in list1:
if i == j:
print (i)
Finds all matching values.
Thanks in advance!
You can use DataFrame.isin instead of loop here,
df['Answer'] = df['ID'].isin(list1).astype(int)
Name ID Answer
0 A 5-6 1
1 B 6-7 0
2 C 8-9 1
3 D 7 0
You already loop by .apply, so you can omit loops and for test is use in for membership of list:
def f(rows):
if rows['ID'] in list1:
val = 1
else:
val = 0
return val
df['Answer']= df.apply(f,axis=1)
print (df)
Name ID Answer
0 A 5-6 1
1 B 6-7 0
2 C 8-9 1
3 D 7 0
Simplier is use lambda function with specify column:
df['Answer']= df['ID'].apply(lambda x: 1 if x in list1 else 0)
Or:
df['Answer']= df['ID'].apply(lambda x: int(x in list1))

Group by a range of numbers Python

I have a list of numbers in a python data frame and want to group these numbers by a specific range and count. The numbers range from 0 to 20 but lets say there might not be any number 6 in that case I want it to show 0.
dataframe column looks like
|points|
5
1
7
3
2
2
1
18
15
4
5
I want it to look like the below
range | count
1 2
2 2
3 1
4 1
5 2
6 0
7 ...
8
9...
I would iterate through the input lines and fill up a dict with the values.
All you have to do then is count...
import collections
#read your input and store the numbers in a list
lines = []
with open('input.txt') as f:
lines = [int(line.rstrip()) for line in f]
#pre fill the dictionary with 0s from 0 to the highest occurring number in your input.
values = {}
for i in range(max(lines)+1):
values[i] = 0
# increment the occurrence by 1 for any found value
for val in lines:
values[val] += 1
# Order the dict:
values = collections.OrderedDict(sorted(values.items()))
print("range\t|\tcount")
for k in values:
print(str(k) + "\t\t\t" + str(values[k]))
repl: https://repl.it/repls/DesertedDeafeningCgibin
Edit:
a slightly more elegant version using dict comprehension:
# read input as in the first example
values = {i : 0 for i in range(max(lines)+1)}
for val in lines:
values[val] += 1
# order and print as in the first example

Frequency of numbers in an array

I want to get the frequency of numbers in an unsorted array. I am getting the frequency of numbers, but the output shows the frequency of a particular number multiple times. I want the resulting frequency to be shown only once.
A = [2,5,1,2,4,6,3,10,3,4,3,2,3,2,15]
B = max(A) + 1
F =[None] * B
for i in range(0,B):
F[i] = 0
for j in range(0,len(A)):
F[A[j]] = F[A[j]] + 1
for k in range(0,len(A)):
if F[A[k]] != 0:
print("Frequency of ", A[k] , " is : " , F[A[k]])
Output obtained showing frequency of say 2, four times.
Frequency of 2 is : 4
Frequency of 5 is : 1
Frequency of 1 is : 1
Frequency of 2 is : 4
Frequency of 4 is : 2
Frequency of 6 is : 1
Frequency of 3 is : 4
Frequency of 10 is : 1
Frequency of 3 is : 4
Frequency of 4 is : 2
Frequency of 3 is : 4
Frequency of 2 is : 4
Frequency of 3 is : 4
Frequency of 2 is : 4
Frequency of 15 is : 1
Use collections.Counter for this
In [1]: from collections import Counter
In [2]: A = [2,5,1,2,4,6,3,10,3,4,3,2,3,2,15]
In [3]: for k, v in Counter(A).items():
...: print('Frequency of {} is {}'.format(k, v))
...:
Frequency of 2 is 4
Frequency of 5 is 1 ...
You can use a dict data structure for that. See the well commented code within:
# This function creates the collection frequencies
def get_collection_frequency(mylist):
# Dictionary data structure is used
mydict = {}
# Loop through the input list
for index in mylist:
# If the item is already there
if (index in mydict):
# Increase its frequency
mydict[index] += 1
# If it is not
else:
# Set its frequency equal to 1
mydict[index] = 1
# Return the dictionary
return mydict
A = [2,5,1,2,4,6,3,10,3,4,3,2,3,2,15]
new = get_collection_frequency(A)
print(new)
Returns: {2: 4, 5: 1, 1: 1, 4: 2, 6: 1, 3: 4, 10: 1, 15: 1}
get the set of the list to remove multiple occurrences, then just loop through:
for num in set(A):
print("Frequency of {} is {}".format(num,A.count(num)))
output:
Frequency of 1 is 1
Frequency of 2 is 4
Frequency of 3 is 4
Frequency of 4 is 2
Frequency of 5 is 1
Frequency of 6 is 1
Frequency of 10 is 1
Frequency of 15 is 1

Python nested loop miscounting instances of integers in list

I'm banging my head against the wall trying to figure out why this nested loop is miscounting the number of times an integer occurs in a list. I've set up a function to take two lines of input, n and ar, where n is the number of integers in ar and ar is the list of integers. My code is below:
import sys
n = sys.stdin.readline()
n = int(n)
ar = sys.stdin.readline()
ar = ar.split(' ')
ar = [int(i) for i in ar]
def find_mode(n,ar):
# create an empty dict and initially set count of all integers to 1
d = {}
for i in range(n):
d[ar[i]] = 1
for i in range(n):
# hold integer i constant and check subsequent integers against it
# increase count if match
x = ar[i]
for k in range(i+1,n):
if ar[k] == x:
d[ar[k]] += 1
print(d)
The counter seems to be increasing the count by 1 every time, which leads me to believe it's a problem with the nested loop.
>>> 9
>>> 1 2 3 4 4 9 9 0 0
{0: 2, 1: 1, 2: 1, 3: 1, 4: 2, 9: 2}
OK
>>> 10
>>> 1 2 3 4 4 9 9 0 0 0
{0: 4, 1: 1, 2: 1, 3: 1, 4: 2, 9: 2}
Count of 0 increased by +2
>>> 11
>>> 1 2 3 4 4 9 9 0 0 0 0
{0: 7, 1: 1, 2: 1, 3: 1, 4: 2, 9: 2}
Count of 0 increased by +3
I understand there might be more efficient or "pythonic" ways to count the amount of times a number occurs in a list but this was the solution I came up with and as someone still learning Python, it would help to understand why this exact solution is failing. Many thanks in advance.
This is because for each distinct number in the list (call it x) you count the number of subsequent appearances. This is fine if a number only occurs twice but if it occurs multiple times you will over-count for each additional appearance.
For example: [0, 0, 0, 0]. You iterate over the list and then for each item you iterate over the list that follows that item. So for the first 0 you count a total of 3 subsequent 0s. For the second however you will count a total of 2 and for the third a total of 1 which makes 6. This is why you have 3 too much in the end.
You can achieve this task by using collections.Counter:
>>> from collections import Counter
>>> d = Counter(ar)
I'm not exactly sure that I can fix your specific problem, but would something like this work instead?
d={}
for x in ar:
d[x] = d.get(x, 0) + 1
I understand that you want to fix your existing work as a learning exercise, but I'm not sure that that approach is even the right one. As it is, I can't really tell what you're going for, so it's hard for me to offer specific advice. I would recommend that you don't throw good time after bad.
python has a method to do exactly what you're describing.
It's called .count().
If you do ar.count(3), it will return the number of occurences of 3 in the list ar.
** In your case:**
There's no need for a nested loop as you only need one loop.
Try this:
dic = {}
for num in ar:
if num not in dic:
dic[num] = 1
else:
dic[num] += 1
This would produce the dict you want with the numbers and their occurences
You can refer to other answers as to how you should solve this problem more efficiently, but to answer the question you're asking (Why doesn't this nested loop work?):
To visualize what your nested loop is doing consider the following input:
0 0 0 0 0
Your algorithm will count the following:
0 0 0 0 0
^ ^ ^ ^ ^ (5)
then,
0 0 0 0 0
^ ^ ^ ^ (4)
then,
0 0 0 0 0
^ ^ ^ (3)
then,
0 0 0 0 0
^ ^ (2)
and finally,
0 0 0 0 0
^ (1)
What happens is it counts the number of 0's multiple times over. In this instance it will count
15 0's (5+4+3+2+1)
itertools are your friend
from itertools import groupby
def group_by_kv_l(n):
l = []
for k, v in groupby(n):
l.append((len(list(v)),int(k)))
return l
def group_by_kv_d(n):
d = {}
for k, v in groupby(n):
d[(int(k))] = len(list(v))
return d
if __name__ == "__main__":
n = input().split()
n = "".join(n)
print(
"STDIN: {}".format(n)
)
print(
group_by_kv_l(n)
)
print(
group_by_kv_d(n)
)

Python: Iterate and Print Integer

In Java, we can do something like:
int i = 0;
while (i < 10)
System.out.println(i++);
where it iterates i and prints it. Can the same be done in python?
EDIT:
Specifically, I'd like to do something like:
words = ["red","green","blue"]
current_state = 0
for word in words:
for char in word:
print(char,current_state,current_state+1)
Result
r 0 1
e 1 2
d 2 3
g 3 4
r 4 5
e 5 6
....
If you want the equivalent of the ++ operator in Java, the answer is no. Python requires you to do:
i += 1
on its own line.
However, you may be looking for enumerate, which allows you to keep track of what index you are at while iterating over a container:
>>> for i, j in enumerate(['a', 'b', 'c', 'd']):
... print(i, j)
...
0 a
1 b
2 c
3 d
>>>
i = 0
while i < 10:
i += 1
print i

Categories