10
5
-1
-1
-1
1
1
0
2
...
If I want to count the number of occurrences of each number in a file, how do I use python to do it?
This is almost the exact same algorithm described in Anurag Uniyal's answer, except using the file as an iterator instead of readline():
from collections import defaultdict
try:
from io import StringIO # 2.6+, 3.x
except ImportError:
from StringIO import StringIO # 2.5
data = defaultdict(int)
#with open("filename", "r") as f: # if a real file
with StringIO("10\n5\n-1\n-1\n-1\n1\n1\n0\n2") as f:
for line in f:
data[int(line)] += 1
for number, count in data.iteritems():
print number, "was found", count, "times"
Counter is your best friend:)
http://docs.python.org/dev/library/collections.html#counter-objects
for(Python2.5 and 2.6) http://code.activestate.com/recipes/576611/
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
... cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})
# or just cnt = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
for this :
print Counter(int(line.strip()) for line in open("foo.txt", "rb"))
##output
Counter({-1: 3, 1: 2, 0: 1, 2: 1, 5: 1, 10: 1})
I think what you call map is, in python, a dictionary.
Here is some useful link on how to use it: http://docs.python.org/tutorial/datastructures.html#dictionaries
For a good solution, see the answer from Stephan or Matthew - but take also some time to understand what that code does :-)
Read the lines of the file into a list l, e.g.:
l = [int(line) for line in open('filename','r')]
Starting with a list of values l, you can create a dictionary d that gives you for each value in the list the number of occurrences like this:
>>> l = [10,5,-1,-1,-1,1,1,0,2]
>>> d = dict((x,l.count(x)) for x in l)
>>> d[1]
2
EDIT: as Matthew rightly points out, this is hardly optimal. Here is a version using defaultdict:
from collections import defaultdict
d = defaultdict(int)
for line in open('filename','r'):
d[int(line)] += 1
New in Python 3.1:
from collections import Counter
with open("filename","r") as lines:
print(Counter(lines))
Use collections.defaultdict so that
by deafult count for anything is
zero
After that loop thru lines in file
using file.readline and convert
each line to int
increment counter for each value in
your countDict
at last go thru dict using for intV,
count in countDict.iteritems() and
print values
Use dictionary where every line is a key, and count is value. Increment count for every line, and if there is no dictionary entry for line initialize it with 1 in except clause -- this should work with older versions of Python.
def count_same_lines(fname):
line_counts = {}
for l in file(fname):
l = l.rstrip()
if l:
try:
line_counts[l] += 1
except KeyError:
line_counts[l] = 1
print('cnt\ttxt')
for k in line_counts.keys():
print('%d\t%s' % (line_counts[k], k))
l = [10,5,-1,-1,-1,1,1,0,2]
d = {}
for x in l:
d[x] = (d[x] + 1) if (x in d) else 1
There will be a key in d for every distinct value in the original list, and the values of d will be the number of occurrences.
counter.py
#!/usr/bin/env python
import fileinput
from collections import defaultdict
frequencies = defaultdict(int)
for line in fileinput.input():
frequencies[line.strip()] += 1
print frequencies
Example:
$ perl -E'say 1*(rand() < 0.5) for (1..100)' | python counter.py
defaultdict(<type 'int'>, {'1': 52, '0': 48})
Related
I have a file formatted this way -
{'apple': 4, 'orange': 3, 'peach': 1}
{}
{'apple': 1, 'banana': 1}
{'peach': 1}
{}
{}
{'pear': 3}
...
[10k more lines like this]
I want to create a new text file to store the total count of each of these fruits/objects like this -
apple:110
banana:200
pineapple:50
...
How do I do this?
My attempt: I've tried using Python (If this is confusing, please skip it) -
f = open("fruits.txt","r")
lines = f.readlines()
f.close()
g = open("number_of_fruits.txt","a")
for line in lines: #Iterating through every line,
for character in "{}'": #Removing extra characters,
line = line.replace(character, "")
for i in range(0,line.count(":")): #Using the number of colons as a counter,
line = line[ [m.start() for m in re.finditer("[a-z]",line)][i] : [m.start() for m in re.finditer("[0-9]",line)][i] + 1 ] #Slice the line like this - line[ith time I detect any letter : ith time I detect any number + 1]
#And then somehow store that number in temp, slicing however needed for every new fruit
#Open a new file
#First look if any of the fruits in my line already exist
#If they do:
#Convert that sliced number part of string to integer, add temp to it, and write it back to the file
#else:
#Make a newline entry with the object name and the sliced number from line.
The number of functions in Python is very overwhelming to begin with. And at this point I'm just considering using C, which is already a terrible idea.
Avoid using eval.
I would opt for treating it as JSON if you can ensure the formatting will be as above.
import json
from collections import Counter
with open('fruits.txt') as f:
counts = Counter()
for line in f.readlines():
counts.update(json.loads(line.replace("'", '"')))
If you want the output as defined above:
for fruit, count in counts.items():
print(f"{fruit}:{count}")
Updated Answer
Based on #DarryIG's literal_eval suggestion in comments, negates JSON use.
from ast import literal_eval
from collections import Counter
with open('fruits.txt') as f:
counts = Counter()
for line in f.readlines():
counts.update(literal_eval(line))
You can use built-in functions of python like literal_eval for evaluate each lines to dictionaries in python:
from ast import literal_eval
from collections import defaultdict, Counter
with open("input.txt", 'r') as inputFile:
counts = Counter()
for line in inputFile:
a = literal_eval(line)
counts.update(Counter(a))
print(dict(counts))
output:
{'apple': 5, 'orange': 3, 'banana': 1, 'peach': 2, 'pear': 3}
using defaultdict and json
import json
from collections import defaultdict
result = defaultdict(int)
with open('fruits.txt') as f:
for line in f:
data = json.loads(line.replace("'", '"'))
for fruit, num in data.items():
result[fruit] += num
print(result)
output
defaultdict(<class 'int'>, {'apple': 5, 'orange': 3, 'peach': 2, 'banana': 1, 'pear': 3})
EDIT: I would recommend using #BenjaminRowell answer (I upvoted it). I will keep this one just for brevity.
EDIT2 (22 May 2020): If it was using double quotes instead of single quotes this would be ndjson/jsonlines format (here is interesting discussion on relationship between the two). You can use ndjson or jsonlines packages to process it, e.g.:
import ndjson
from collections import Counter
with open('sample.txt') as f:
# if using double quotes, you can do:
#data = ndjson.load(f)
# because it uses single quotes - read the whole file and replace the quotes
data = f.read()
data = ndjson.loads(data.replace("'", '"'))
counts = Counter()
for item in data:
counts.update(item)
print(counts)
In my experiences, this is a special work to do. I searched in many different ways but still can't find answer to it.
here the question is.
I have a dict of Chinese phrase frequency.It looks like:
{'中国':18950, '我们':16734, '我国':15400, ...}
What I need to do is count every single character's frequency, for example:
character '国' appears in two phrases ('中国'and '我国') , so this character's frequency should be:
{'国':(18950+15400)}
How can I achieve this?
Simple example,
d = {'abd':2, 'afd':3}
f = {}
for key in d:
strlen = len(key)
for i in range(strlen):
if key[i] in f:
f[key[i]] += d[key]
else:
f[key[i]] = d[key]
print f #gives {'a': 5, 'b': 2, 'd': 5, 'f': 3}
My way:
from collections import Counter
c={'中国':18950, '我们':16734, '我国':15400}
print(Counter([j for k,v in c.items() for i in k for j in [i]*v]))
Output:
Counter({'国': 34350, '我': 32134, '中': 18950, '们': 16734})
Something like this should work:
from collections import defaultdict
char_dict = defaultdict(int)
for phrase, count in phrase_dict.iteritems():
for char in phrase:
char_dict[char] += count
d = {'中国':18950, '我们':16734, '我国':15400, ...}
q = 0
for i in d:
if '国' in i:
a = (d[i])
q += a
print(q)
Pretty much what the title says, I want to create a dictionary with phone numbers as keys and every time a new number is added I want its value to increment by one.
Like this: {'7806969':1 , '78708708' : 2} and etc...
nodes=[1,2,3,4,5,6,7,8,9]
customers=open('customers.txt','r')
calls=open('calls.txt.','r')
sorted_no={}
for line in customers:
rows=line.split(";")
if rows[0] not in sorted_no:
sorted_no[rows[0]]=nodes[0]
else:
sorted_no[rows[0]]=
print(sorted_no)
That was the code I have so far, I tried creating a list for my problem but that plan quickly fell apart.
use a defaultdict and just sort the output if you actually want it sorted by least to most frequent:
sorted_no = defaultdict(int)
for line in customers:
rows = line.split(";")
sorted_no[rows[0]] += 1
Or just use a Counter dict:
from collections import Counter
with open('customers.txt') as customers:
c = Counter(line.split(";")[0] for line in customers )
print(c.most_common())
To actually just increment the count per element and because you have no duplicates use enumerate :
with open('customers.txt') as customers:
sorted_no = {}
for ind, line in enumerate(customers,1):
rows=line.split(";")
sorted_no[rows[0]] = ind
Or as a dict comprehension:
with open('customers.txt') as customers:
sorted_no = {line.split(";")[0]:ind for ind, line in enumerate(customers,1)}
If order is important simply use:
from collections import OrderedDict
sorted_no = OrderedDict()
with open('customers.txt') as customers:
sorted_no = OrderedDict((line.split(";")[0], ind) for ind, line in enumerate(customers,1))
enumerate(customers,1) gives every index of each line in customers but we pass in 1 as the start index so we start at 1 instead of 0.
If I understand you, all you need to do is increase the number you're using as you go:
sorted_no = {}
with open("customers.txt") as fp:
for line in fp:
number = line.split(";")[0]
if number not in sorted_no:
sorted_no[number] = len(sorted_no) + 1
This produces something like
{'7801234567': 4,
'7801236789': 6,
'7803214567': 9,
'7804321098': 7,
'7804922860': 3,
'7807890123': 1,
'7808765432': 2,
'7808907654': 5,
'7809876543': 8}
where the first unique phone number seen gets 1, and the second 2, etc.
This is probably one of the shorter ways to do it (thank Jon Clements in comments):
#!/usr/bin/env python3.4
from collections import defaultdict
import itertools
sorted_no = defaultdict(itertools.count(1).__next__)
for line in customers:
rows=line.split(";")
# no need to put anything,
# just use the key and it increments automagically.
sorted_no[rows[0]]
itertools.count(1) produces a generator, which is equivalent (roughly) to:
def lazy():
counter = 0
while True:
counter += 1
yield counter
I left my original answer so people can learn about the default-binding gotcha, or maybe even use it if they want:
#!/usr/bin/env python3.4
from collections import defaultdict
def lazy_gen(current=[0]):
current[0] += 1
return current[0]
sorted_no = defaultdict(lazy_gen)
for line in customers:
rows=line.split(";")
# no need to put anything,
# just use the key and it increments automagically.
sorted_no[rows[0]]
It works because Python's default assignment happens once, and when you use a mutable object (list in this case), you can change the function's return value dynamically.
It's a little wierd though :)
what I want to do is read from a file, and then for each word, append it to a dictionary along with its number of occurances.
example:
'today is sunday. tomorrow is not sunday.'
my dictionary would then be this:
{'today': 1, 'is': 2, 'sunday': 2, 'tomorrow': 1, 'not': 1}
the way I'm going about it is to use readline and split to create a list, and then append each element and it's value to an empty dictionary, but it's not really working so far. here's what I have so far, although its incomplete:
file = open('any_file,txt', 'r')
for line in file.readline().split():
for i in range(len(line)):
new_dict[i] = line.count(i) # I'm getting an error here as well, saying that
return new_dict # I can't convert int to str implicitly
the problem with this is that when my dictionary updates when each line is read, the value of a word won't accumulate. so if in another line 'sunday' occurred 3 times, my dictionary would contain {'sunday': 3} instead of {'sunday': 5}. any help? I have no idea where to go from here and I'm new to all of this.
You are looking for collections.Counter.
e.g:
from itertools import chain
with open("file.txt") as file:
Counter(chain.from_iterable(line.split() for line in file))
(Using a itertools.chain.from_iterable() generator expression too.)
Note that your example only works on the first line, I presume this wasn't intentional, and this solution is for across the whole file (obviously it's trivial to swap that around).
Here is a simple version that doesn't deal with punctuation
from collections import Counter
counter = Counter()
with open('any_file,txt', 'r') as file:
for line in file:
for word in line.split():
counter[word] += 1
can also be written like this:
from collections import Counter
counter = Counter(word for line in file for word in line.split())
Here's one way to solve the problem using a dict
counter = {}
with open('any_file,txt', 'r') as file:
for line in file:
for word in line.split():
if word not in counter:
counter[word] = 1
else:
counter[word] += 1
try this
file = open('any_file.txt', 'r')
myDict = {}
for line in file:
lineSplit = line.split(" ")
for x in xrange(len(lineSplit)):
if lineSplit[x] in myDict.keys(): myDict[lineSplit[x]] += 1
else: myDict[lineSplit[x]] = 1
file.close()
print myDict
Do you use Python 3 or Python 2.7?
If yes, use Counter from collections library:
import re
from collections import Counter
words = re.findall('\w+', open('any_file.txt').read().lower())
Counter(words).most_common(10)
But you get list of tuples though. It should be easy for you to turn list of tuples to dictionary.
I am looking for the most elegant way to do the following:
Let's say that I want to count number of times each integer appears in a list; I could do it this way:
x = [1,2,3,2,4,1,2,5,7,2]
dicto = {}
for num in x:
try:
dicto[num] = dicto[num] + 1
except KeyError:
dicto[num] = 1
However, I think that
try:
dicto[num] = dicto[num] + 1
except KeyError:
dicto[num] = 1
is not the most elegant ways to do it; I think that I saw the above code replaced by a single line. What is the most elegant way to do this?
I realized that this might be a repeat, but I looked around and couldn't find what I was looking for.
Thank You in advance.
Use the Counter class
>>> from collections import Counter
>>> x = [1,2,3,2,4,1,2,5,7,2]
>>> c = Counter(x)
Now you can use the Counter object c as dictionary.
>>> c[1]
2
>>> c[10]
0
(This works for non-existant values too)
>>> from collections import defaultdict
>>> x = [1,2,3,2,4,1,2,5,7,2]
>>> d = defaultdict(int)
>>> for i in x:
d[i] += 1
>>> dict(d)
{1: 2, 2: 4, 3: 1, 4: 1, 5: 1, 7: 1}
Or just collections.Counter, if you are on Python 2.7+.
Bucket sort, as you're doing, is entirely algorithmically appropriate (discussion). This seems ideal when you don't need the additional overhead from Counter:
from collections import defaultdict
wdict = defaultdict(int)
for word in words:
wdict[word] += 1