I'm new to Python and am working on a program that will count the instances of words in a simple text file. The program and the text file will be read from the command line, so I have included into my programming syntax for checking command line arguments. The code is below
import sys
count={}
with open(sys.argv[1],'r') as f:
for line in f:
for word in line.split():
if word not in count:
count[word] = 1
else:
count[word] += 1
print(word,count[word])
file.close()
count is a dictionary to store the words and the number of times they occur. I want to be able to print out each word and the number of times it occurs, starting from most occurrences to least occurrences.
I'd like to know if I'm on the right track, and if I'm using sys properly. Thank you!!
What you did looks fine to me, one could also use collections.Counter (assuming you are python 2.7 or newer) to get a bit more information like the number of each word. My solution would look like this, probably some improvement possible.
import sys
from collections import Counter
lines = open(sys.argv[1], 'r').readlines()
c = Counter()
for line in lines:
for work in line.strip().split():
c.update(work)
for ind in c:
print ind, c[ind]
Your final print doesn't have a loop, so it will just print the count for the last word you read, which still remains as the value of word.
Also, with a with context manager, you don't need to close() the file handle.
Finally, as pointed out in a comment, you'll want to remove the final newline from each line before you split.
For a simple program like this, it's probably not worth the trouble, but you might want to look at defaultdict from Collections to avoid the special case for initializing a new key in the dictionary.
I just noticed a typo: you open the file as f but you close it as file. As tripleee said, you shouldn't close files that you open in a with statement. Also, it's bad practice to use the names of builtin functions, like file or list, for your own identifiers. Sometimes it works, but sometimes it causes nasty bugs. And it's confusing for people who read your code; a syntax highlighting editor can help avoid this little problem.
To print the data in your count dict in descending order of count you can do something like this:
items = count.items()
items.sort(key=lambda (k,v): v, reverse=True)
print '\n'.join('%s: %d' % (k, v) for k,v in items)
See the Python Library Reference for more details on the list.sort() method and other handy dict methods.
I just did this by using re library. This was for average words in a text file per line but you have to find out number of words per line.
import re
#this program get the average number of words per line
def main():
try:
#get name of file
filename=input('Enter a filename:')
#open the file
infile=open(filename,'r')
#read file contents
contents=infile.read()
line = len(re.findall(r'\n', contents))
count = len(re.findall(r'\w+', contents))
average = count // line
#display fie contents
print(contents)
print('there is an average of', average, 'words per sentence')
#closse the file
infile.close()
except IOError:
print('An error oocurred when trying to read ')
print('the file',filename )
#call main
main()
Related
I tried to find and count specific 3-word-phrases in txt files by using this code:
phrases = ['hi there you','eat sausage bread', ...]
with open('test.txt') as f:
for word in phrases:
contents = f.read()
count = contents.count('word')
print(word, count)
Python lists me every phrase, but it doesn't count it accurately. Instead the 1st phrase is always 63 and any of the following are 0. As I have more than 100 phrases and also lot's of different files it would be a waste of time to count any phrase on its own (which btw works with this script). Maybe someone could clear my obvious mistake or knows a possible solutions, I'd be very thankful.
You read your entire file into contents for each word. Since you never restore the file pointer to the start of the file, after the first read it only stores an empty string.
Fix by reading the file only once.
with open('test.txt') as f:
contents = f.read()
for word in phrases:
count = contents.count(word)
print(word, count)
my code is trying to count how many words I have in the file which is printed above, after I wish to be able to enter a word and for the code to tell me how many of that words there are in the text and the position of it.
2 seconds the code did not paste.
Will not let me post image so here is the code
import os
os.path.isfile('text1.text')
file = open('text1.txt','r')
print(file.readline())
count = 0
with open(text1, "rb") as fp:
data = data.translate(string>maketrans("",""), string.punctuation)
for word in data.split():
if word in input_list:
count += 1
print(count)
First thing wrong about your code, in os.path.isfile('text1.txt') you're testing whether the file text1.txt exists. Therefore, the return value will be either True or False and not putting it within a condition is completely unnecessary and unreasonable.
Ok, now for why your code is printing correctly but not counting words. It is because the first time you open the file (text1.txt) you open it correctly, but on the second time you as open to open the file from the variable text1 and as far as I can see, by the code you provided, there's no such variable. So the correct way would be something like this:
# pass string instead of variable
with open('text1.txt', "r") as fp: # use only "r" as 'b' is for binary and it's a text file
data = data.translate(string.maketrans("",""), string.punctuation)
for word in data.split():
if word in input_list:
count += 1
Well, additionally, I don't know where this data.translate came from so I can't tell if it's interfering (I don't even know if it works - it didn't work for me).
I have a text file need to search it may be uppercase or lowercase letters in file using python
Maybe you should spent more time writing the question, if you expect us to invest time in answering it. Nevertheless, from what I understand you are looking for something like this:
import sys
with open(sys.argv[0], "r") as f:
for row in f:
for chr in row:
if chr.isupper():
print chr, "uppercase"
else:
print chr, "lowercase"
I'm trying to write a Python script that uses a particular external application belonging to the company I work for. I can generally figure things out for myself when it comes to programming and scripting, but this time I am truely lost!
I can't seem to figure out why the while loop wont function as it is meant to. It doesn't give any errors which doesn't help me. It just seems to skip past the important part of the code in the centre of the loop and then goes on to increment the "count" like it should afterwards!
f = open('C:/tmp/tmp1.txt', 'w') #Create a tempory textfile
f.write("TEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\nTEXTFILE\n") #Put some simple text in there
f.close() #Close the file
count = 0 #Insert the line number from the text file you want to begin with (first line starts with 0)
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt')) #Get the number of lines from the textfile
f = open('C:/tmp/tmp2.txt', 'w') #Create a new textfile
f.close() #Close it
while (count < num_lines): #Keep the loop within the starting line and total number of lines from the first text file
with open('C:/tmp/tmp1.txt', 'r') as f: #Open the first textfile
line2 = f.readlines() #Read these lines for later input
for line2[count] in f: #For each line from chosen starting line until last line from first text file,...
with open('C:/tmp/tmp2.txt', 'a') as g: #...with the second textfile open for appending strings,...
g.write("hello\n") #...write 'hello\n' each time while "count" < "num_lines"
count = count + 1 #Increment the "count"
I think everything works up until: "for line2[count] in f:"
The real code I'm working on is somewhat more complicated, and the application I'm using isn't exactly for sharing, so I have simplified the code to give silly outputs instead just to fix the problem.
I'm not looking for alternative code, I'm just looking for a reason why the loop isn't working so I can try to fix it myself.
All answers will be appreciated, and thanking everyone in advance!
Cormac
Some comments:
num_lines = sum(1 for line1 in open('C:/tmp/tmp1.txt'))
Why? What's wrong with len(open(filename, 'rb').readlines())?
while (count < num_lines):
...
count = count + 1
This is bad style, you could use:
for i in range(num_lines):
...
Note that I named your index i, which is universally recognized, and that I used range and a for loop.
Now, your problem, like I said in the comment, is that f is a file (that is, a stream of bytes with a location pointer) and you've read all the lines from it. So when you do for line2[count] in f:, it will try reading a line into line2[count] (this is a bit weird, actually, you almost never use a for loop with a list member as an index but apparently you can do that), see that there's no line to read, and never executes what's inside the loop.
Anyway, you want to read a file, line by line, starting from a given line number? Here's a better way to do that:
from itertools import islice
start_line = 0 # change this
filename = "foobar" # also this
with open(filename, 'rb') as f:
for line in islice(f, start_line, None):
print(line)
I realize you don't want alternative code, but your code really is needlessly complicated.
If you want to iterate over the lines in the file f, I suggest replacing your "for" line with
for line in line2:
# do something with "line"...
You put the lines in an array called line2, so use that array! Using line2[count] as a loop variable doesn't make sense to me.
You seem to get it wrong how the 'for line in f' loop works. It iterates over a file and calls readline, until there are no lines to read. But at the moment you start the loop all the lines are already read(via f.readlines()) and file's current position is at end. You can achieve what you want by calling f.seek(0), but that doesn't seem to be a good decision anyway, since you're going to read file again and that's slow IO.
Instead you want to do smth like:
for line in line2[count:]: # iterate over lines read, starting with `count` line
do_smth_with(line)
I'm pretty sure I'm over thinking this and there's a simple outcome for it, but I just can't seem to put it all together.
I'm looking for a kind of a search method. I'd like a Python script search a text file for a keyword and count how many lines it appears on. Though if the keyword comes up on a single line multiple times, I'd like to still only count it once.
Long story short; If a keyboard comes up on a single line, I want it to count that line and add it to what will be a math equation.
Any help is greatly appreciated! Thanks in advance.
You can define the following function.
def lcount(keyword, fname):
with open(fname, 'r') as fin:
return sum([1 for line in fin if keyword in line])
Now if you want to know the number of lines containing "int" in "foo.cpp", you do:
print lcount('int', 'foo.cpp')
An alternative that you can do on the command line (if you are on an appropriate platform) is:
grep int foo.cpp | wc -l
A non-Python Unix solution is fairly immediate:
"search a text file for a keyword" is a grep
"count how many lines" is a wc
Do you have difficulty implementing the essence of either of these in Python?
Assuming f is the file object,
lines = f.readlines()
print len([line for line in lines if keyword in line])
Perhaps you could try this:
def kwdCount(textContent, keyword):
lines=textContent.split("\n")
count=len([1 for line in lines if line.find(keyword)!=-1])
return count
>>> yourTextFile="hello world\n some words here\n goodbye world"
>>> kwdCount(ourTextFile,"world")
2