I have a function that is meant to count the number of times each key in a dictionary occurs in a list of files (list_of_docs).
def calc(dictionary):
for token in dictionary:
count = 0
for files in list_of_docs:
current_file = open(files.name, 'r')
text = current_file.read()
line = text.split()
if token in line:
count +=1
return count
When I call this function, it doesn't stop. When I interrupt the program it indicates that it's stuck on the line line = text.split(). (And if I remove that line, it gets stuck on text = current_doc.read().) Not sure why the program isn't stopping?
you are not closing your files, call current_file.close() when you are finished reading it. Alternatively you can wrap the file reading in a with statement:
with open(current_file, 'r') as f:
f.read()
...
Related
I have a .txt file that I created with multiple lines.
When I run a for loop, with a count accumulator, it skips lines.
It skips the top line, and starts with the second, prints the fourth, the sixth, etc.
What is it I'm missing?
def main():
# Open file line_numbers.txt
data_file = open('line_numbers.txt', 'r')
# initialize accumulatior
count = 1
# Read all lines in data_file
for line in data_file:
# Get the data from the file
line = data_file.readline()
# Display data retrieved
print(count, ": ", line)
# add to count sequence
count += 1
Try removing the "line=data_file.readline()" altogether? I suspect the "for line in data_file:" is also a readline operation.
You for loop is iterating over the data_file and your readline() is competing with it. Erase the line = data_file.readline() line of your code for this result:
# Read all lines in data_file
count = 1
for line in data_file:
# Display data retrieved
print(count, ": ", line)
# add to count sequence
count += 1
for line in data_file already gets the text of each line for you - the subsequent call to readline then gets the following line. In other words, removing the call to readline will do what you want. At the same time, you don't need to keep track of an accumulator variable yourself - python has a built-in way of doing this using enumerate - in other words:
data_file = open('line_numbers.txt', 'r')
for count, line in enumerate(data_file):
...
I have the following problem. I am supposed to open a CSV file (its an excel table) and read it without using any library.
I tried already a lot and have now the first row in a tuple and this in a list. But only the first line. The header. But no other row.
This is what I have so far.
with open(path, 'r+') as file:
results=[]
text = file.readline()
while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
The output should: be every line in a tuple and all the tuples in a list.
My question is now, how can I read the other lines in python?
I am really sorry, I am new to programming all together and so I have a real hard time finding my mistake.
Thank you very much in advance for helping me out!
This problem was many times on Stackoverflow so you should find working code.
But much better is to use module csv for this.
You have wrong indentation and you use return results after reading first line so it exits function and it never try read other lines.
But after changing this there are still other problems so it still will not read next lines.
You use readline() so you read only first line and your loop will works all time with the same line - and maybe it will never ends because you never set text = ''
You should use read() to get all text which later you split to lines using split("\n") or you could use readlines() to get all lines as list and then you don't need split(). OR you can use for line in file: In all situations you don't need while
def read_csv(path):
with open(path, 'r+') as file:
results = []
text = file.read()
for line in text.split('\n'):
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
def read_csv(path):
with open(path, 'r+') as file:
results = []
lines = file.readlines()
for line in lines:
line = line.rstrip('\n') # remove `\n` at the end of line
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
def read_csv(path):
with open(path, 'r+') as file:
results = []
for line in file:
line = line.rstrip('\n') # remove `\n` at the end of line
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
All this version will not work correctly if you will '\n' or , inside item which shouldn't be treated as end of row or as separtor between items. These items will be in " " which also can make problem to remove them. All these problem you can resolve using standard module csv.
Your code is pretty well and you are near goal:
with open(path, 'r+') as file:
results=[]
text = file.read()
#while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
Your Code:
with open(path, 'r+') as file:
results=[]
text = file.readline()
while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
So enjoy learning :)
One caveat is that the csv may not end with a blank line as this would result in an ugly tuple at the end of the list like ('',) (Which looks like a smiley)
To prevent this you have to check for empty lines: if line != '': after the for will do the trick.
I have a simple program which looks through a file, finds any numbers inside, and adds them up into a variable called running_total. My issue seems to be that my file name is the thing that is being read instead of its contents.
import re
file = input('Enter file name:')
open(file)
print(file)
running_total = None
for line in file:
line = line.rstrip()
numbers = re.findall("[0-9]+", line)
print(numbers)
for number in numbers:
running_total += float(number)
print(running_total)
What am I missing?
file is a string denoting a filename when it comes out of the input function, and it remains a string. So when you iterate over it, you get the letters of the filename one by one. When you call open(file) that returns an object that can be iterated over to provide file content, but you are not currently giving that object a name or re-using it. You really mean something like:
file_name = input('Enter file name:')
file_handle = open(file_name) # this doesn't change file_name, but it does output something new (let's call that file_handle)
for line in file_handle:
....
file_handle.close()
...although the more idiomatic, Pythonic way is to use a with statement:
file_name = input('Enter file name:')
with open(file_name) as file_handle:
for line in file_handle:
....
# and then you don't have to worry about closing the file at the end (or about whether it has been left open if an exception occurs)
Note that the variable file_handle is an object whose class is called file (which is one of the reasons I've changed the variable names here).
I think you'll want to start the running total to a number that can be added to.
Then, you need to get the file handle
And the regex makes rstrip unnecessary
running_total = 0
with open(file) as f:
for line in f:
running_total += sum(float(x) for x in re.findall("[0-9]+", line))
print(running_total)
Also here
https://stackoverflow.com/a/35592562/2308683
Use "with open() as" to read your file, because it should close automatically. Otherwise you need to explicitly tell it to close the file.
Assigning running_total as None threw me errors, but giving it a value of 0 fixed this issue.
Also, instead of using regex and stripping lines, just use isnumeric(). This also removes the second for loop you're using, which should be more efficient.
file = input('Enter file name:')
with open(file, 'r') as f:
file = f.read()
print(file)
running_total = 0
for line in file:
if line.isnumeric():
running_total += int(line)
print(running_total)
I tested this with a txt file containing numbers on their own rows and numbers imbedded in words and it correctly found all instances.
Edit: I just realized the poster wanted to sum all the numbers, not find all the instances. Changed running_total += 1 to running_total += int(line).
I am trying to understand what the commented out lines of code below do. When the lines are commented out, the program works as expected: it reads the function tuple_to_word creates a dictionary with the lines of words.txt as the values.
When the code is uncommented out, however, the program only prints an empty dictionary. But I can't understand why the for loop would have any effect on the call to tuple_to_word. I am guessing that the for loop in question changes the underlying file object, but how?
fin = open('words.txt')
word_dict = {}
'''
for i in fin:
word_dict[i.strip()] = 1
'''
def signature(s):
t = list(s)
t.sort()
t = ''.join(t)
return t
def tuple_to_word():
words_match_tuple = { }
for line in fin:
word = line.strip().lower()
t = signature(word)
words_match_tuple.setdefault(t, []).append(word)
return words_match_tuple
print tuple_to_word()
The answer is: if you activate the code between ''' .. ''' this will parse the input file line by line. Then the function tuple_to_word() will find the file cursor at the end and there will be no line to parse from the input file.
You should either reopen the input file or go to the beginning of the file with:
fin.seek(0)
I just started learning python 3 weeks ago, I apologize if this is really basic. I needed to open a .txt file and print the length of the longest line of code in the file. I just made a random file named it myfile and saved it to my desktop.
myfile= open('myfile', 'r')
line= myfile.readlines()
len(max(line))-1
#the (the "-1" is to remove the /n)
Is this code correct? I put it in interpreter and it seemed to work OK.
But I got it wrong because apparently I was supposed to use a while loop. Now I am trying to figure out how to put it in a while loop. I've read what it says on python.org, watched videos on youtube and looked through this site. I just am not getting it. The example to follow that was given is this:
import os
du=os.popen('du/urs/local')
while 1:
line= du.readline()
if not line:
break
if list(line).count('/')==3:
print line,
print max([len(line) for line in file(filename).readlines()])
Taking what you have and stripping out the parts you don't need
myfile = open('myfile', 'r')
max_len = 0
while 1:
line = myfile.readline()
if not line:
break
if len(line) # ... somethin
# something
Note that this is a crappy way to loop over a file. It relys on the file having an empty line at the end. But homework is homework...
max(['b','aaa']) is 'b'
This lexicographic order isn't what you want to maximise, you can use the key flag to choose a different function to maximise, like len.
max(['b','aaa'], key=len) is 'aaa'
So the solution could be: len ( max(['b','aaa'], key=len) is 'aaa' ).
A more elegant solution would be to use list comprehension:
max ( len(line)-1 for line in myfile.readlines() )
.
As an aside you should enclose opening a file using a with statement, this will worry about closing the file after the indentation block:
with open('myfile', 'r') as mf:
print max ( len(line)-1 for line in mf.readlines() )
As other's have mentioned, you need to find the line with the maximum length, which mean giving the max() function a key= argument to extract that from each of lines in the list you pass it.
Likewise, in a while loop you'd need to read each line and see if its length was greater that the longest one you had seen so far, which you could store in a separate variable and initialize to 0 before the loop.
BTW, you would not want to open the file with os.popen() as shown in your second example.
I think it will be easier to understand if we keep it simple:
max_len = -1 # Nothing was read so far
with open("filename.txt", "r") as f: # Opens the file and magically closes at the end
for line in f:
max_len = max(max_len, len(line))
print max_len
As this is homework... I would ask myself if I should count the line feed character or not. If you need to chop the last char, change len(line) by len(line[:-1]).
If you have to use while, try this:
max_len = -1 # Nothing was read
with open("t.txt", "r") as f: # Opens the file
while True:
line = f.readline()
if(len(line)==0):
break
max_len = max(max_len, len(line[:-1]))
print max_len
For those still in need. This is a little function which does what you need:
def get_longest_line(filename):
length_lines_list = []
open_file_name = open(filename, "r")
all_text = open_file_name.readlines()
for line in all_text:
length_lines_list.append(len(line))
max_length_line = max(length_lines_list)
for line in all_text:
if len(line) == max_length_line:
return line.strip()
open_file_name.close()