Some questions on python (files) [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm new here and I'm also new to python. I would like to get your help, please.
def lines(path, prefix):
funf = open(path, 'r')
dictionary = {}
for lines in funf:
word = lines.split()
a_word = (word[0])
dictionary[a_word] = dictionary.get(a_word, 0) + 1
if prefix != word[0]:
return 0
else:
return dictionary[prefix]
funf.close()
when I run this:
inpath = "filetext1.txt"
print(lines(inpath,"But"))
and I get this:
Traceback (most recent call last):
File "C:...\...\....py", line 29, in <module>
print(lines(inpath,"This"))
File "C:...\...\....py", line 11,
in lines
if prefix != word[0]:
UnboundLocalError: local variable 'word' referenced before assignment
What is the problem, how can I change it so it would be better?
I'm asking for ideas and options (but please, without changing more things in the code... It has to be something like this structure!!!
Thanks!

In your code, the if prefix != words[0] part is happening outside the loop, after the loop has finished running. So, for a non-empty file, words will be the split of the last line of the file. And for an empty file, words will never have been set, causing exactly the error you posted.
As a side note, that for lines in f: is looping over some global object f, not the file you just opened, which is called funf. So, I suspect that f is some kind of empty iterable, and you're seeing this function even when the file you wanted to look at is not empty. If you want to loop over funf, you have to tell Python funf, not f.
And you already know this isn't correct, as in this comment:
word is the split of line. I can't do it outside the for loop
If you want to run it inside the loop, you will need to indent it to match the code inside the loop. In Python, block structure is based on indentation level:
def lines(path, prefix):
funf = open(path, 'r')
dictionary = {}
for lines in f:
word = lines.split()
a_word = (word[0])
dictionary[a_word] = dictionary.get(a_word, 0) + 1
if prefix != word[0]:
return 0
else:
return dictionary[prefix]
funf.close()
That means you'll no longer get an error; words will always be defined when you use it.
There are other problems with this code: you're returning after each line, meaning you'll never get to the second line; you're returning before you close the file, meaning the file never gets closed; it's very misleading to use plural variables names for individual things and singular variable names for lists of things; it's confusing to use a local variable with the same name as the function; etc. But one thing at a time…
After half an hour of pulling teeth, you finally explained what you're trying to do:
I'm trying to count the number of lines whose first word matches the prefix
There is no way to do that with this structure. Whether you do the if inside the loop or out, it doesn't make any sense.
The simplest way to fix it is to remove the if entirely. You're building up a dictionary of counts of each first word, right? So, just look up the value for the given prefix at the end:
def lines(path, prefix):
funf = open(path, 'r')
dictionary = {}
for lines in funf:
word = lines.split()
a_word = (word[0])
dictionary[a_word] = dictionary.get(a_word, 0) + 1
funf.close()
return dictionary.get(prefix, 0)
This will work, but it's incredibly wasteful to build up this whole dictionary just to get a single value out of it, and makes your code much more complicated as well… the whole thing could be written as:
def lines(path, prefix):
with open(path) as f:
return sum(1 for line in f if line.startswith(prefix))
Here's my filetext1.txt:
This is a test.
But this isn't.
But this is.
And this isn't.
The output should obviously be 2, right?
And both versions of my code—the "simplest fix" and the two-liner—both print ut this:
2
This works in both Python 3.3 and 2.7. If it's not working for you, either you failed at copying and pasting the code, or your input file doesn't have any lines starting with "But ".

If you're trying to count the number of lines whose first word match the prefix, why not do something simple like
def lines(path, prefix):
N_matches = 0
f = open(path, 'r')
for line in f:
words = line.split()
first_word = words[0]
if first_word == prefix:
N_matches += 1
f.close()
return N_matches
This could also be accomplished via less code:
def lines(path, prefix):
with open(path, 'r') as f:
return sum([1 for line in f if line.split()[0] == prefix])
As #abarnert points out, an even better way is
return sum(1 for line in f if line.startswith(prefix))

Related

Python 3 How to add specific lines from a list to an array

Below is my code. This code reads lines from a file (called compsc), strips the \n from them, puts them into an array and randomly prints them, eliminating each option that has already been printed. What I want to know is how to read only a specific set of lines into the array, as I will have lots of lines in the .txt file. So, is there some code that can do that, or do I have to put readlines() somewhere?
Thanks in advance!
import random
with open("compsc.txt", "r") as ins:
qarray = []
for line in ins:
line = line.strip()
qarray.append(line)
print (qarray)
loop = 0
while loop != 4:
newquestion = random.sample(qarray, 1)
print (newquestion)
qarray.remove(newcard[0])
loop = loop + 1
You will need to create some function to decide whether or not to keep the line.
import random
def line_filter(line):
"""Return True if you want to keep line, False otherwise."""
...
with open("compsc.txt", "r") as f:
questions = [line.strip() for line in f if line_filter(line)]
random.shuffle(questions)
for question in questions[:4]:
print(question)
This has been covered on this site before. In brief, if your file is not huge i.e. does not cause memory problems you could indeed use readlines. Also look into linecache, which is optimized.

Slice variable from specified letter to specified letter in line that varies in length

New to the site so I apologize if I format this incorrectly.
So I'm searching a file for lines containing
Server[x] ip.ip.ip.ip response=235ms accepted....
where x can be any number greater than or equal to 0, then storing that information in a variable named line.
I'm then printing this content to a tkinter GUI and its way too much information for the window.
To resolve this I thought I would slice the information down with a return line[15:30] in the function but the info that I want off these lines does not always fall between 15 and 30.
To resolve this I tried to make a loop with
return line[cnt1:cnt2]
checked cnt1 and cnt2 in a loop until cnt1 meets "S" and cnt2 meets "a" from accepted.
The problem is that I'm new to Python and I cant get the loop to work.
def serverlist(count):
try:
with open("file.txt", "r") as f:
searchlines = f.readlines()
if 'f' in locals():
for i, line in enumerate(reversed(searchlines)):
cnt = 90
if "Server["+str(count)+"]" in line:
if line[cnt] == "t":
cnt += 1
return line[29:cnt]
except WindowsError as fileerror:
print(fileerror)
I did a reversed on the line reading because the lines I am looking for repeats over and over every couple of minutes in the text file.
Originally I wanted to scan from the bottom and stop when it got to server[0] but this loop wasn't working for me either.
I gave up and started just running serverlist(count) and specifying the server number I was looking for instead of just running serverlist().
Hopefully when I understand the problem with my original loop I can fix this.
End goal here:
file.txt has multiple lines with
<timestamp/date> Server[x] ip.ip.ip.ip response=<time> accepted <unneeded garbage>
I want to cut just the Server[x] and the response time out of that line and show it somewhere else using a variable.
The line can range from Server[0] to Server[999] and the same response times are checked every few minutes so I need to avoid duplicates and only get the latest entries at the bottom of the log.
Im sorry this is lengthy and confusing.
EDIT:
Here is what I keep thinking should work but it doesn't:
def serverlist():
ips = []
cnt = 0
with open("file.txt", "r") as f:
for line in reversed(f.readlines()):
while cnt >= 0:
if "Server["+str(cnt)+"]" in line:
ips.append(line.split()) # split on spaces
cnt += 1
return ips
My test log file has server[4] through server[0]. I would think that the above would read from the bottom of the file, print server[4] line, then server[3] line, etc and stop when it hits 0. In theory this would keep it from reading every line in the file(runs faster) and it would give me only the latest data. BUT when I run this with while cnt >=0 it gets stuck in a loop and runs forever. If I run it with any other value like 1 or 2 then it returns a blank list []. I assume I am misunderstanding how this would work.
Here is my first approach:
def serverlist(count):
with open("file.txt", "r") as f:
for line in f.readlines():
if "Server[" + str(count) + "]" in line:
return line.split()[1] # split on spaces
return False
print serverlist(30)
# ip.ip.ip.ip
print serverlist(";-)")
# False
You can change the index in line.split()[1] to get the specific space separated string of the line.
Edit: Sure, just remove the if condition to get all ip's:
def serverlist():
ips = []
with open("file.txt", "r") as f:
for line in f.readlines():
if line.strip().startswith("Server["):
ips.append(line.split()[1]) # split on spaces
return ips

How to make a dictionary that counts repeated amount of length of any words from a file?

How do I correct this so my dictionary reads length of a word = how many times the length of the word is repeated? Parameters is a file.
def wordLengths(fileName):
d = {}
f = open(fileName)
filename.close()
for line in f:
for word in line:
if len(word) not in d:
d[len(word)] = count.len(word)
return(d)
You're on the right track, but you've got a few mistakes. Let's look at them line by line.
def wordLengths(fileName):
d = {}
f = open(fileName)
So far, so good
filename.close()
You can't close a filename—it's just a string. You can only close a file object, like f. Also, filename and fileName aren't the same thing; capitalization counts. Also, it's too early to close the file—you want to do it after reading all the lines, otherwise you won't get to read anything. So, scrap this line, and add a f.close() right before the return. (A with statement is even better, but you probably haven't learned those yet.)
for line in f:
for word in line:
When you loop over a string, you loop over each character in the string, not each word. If you want words, you have to call line.split().
if len(word) not in d:
d[len(word)] = count.len(word)
Close, but not right. What you want here is: if the length isn't already in the dictionary, store 1; otherwise, add 1 to what's already there. What you've written is: if the length isn't already there, store the length (using some object that doesn't exist); otherwise, do nothing. So:
if len(word) not in d:
d[len(word)] = 1
else:
d[len(word)] += 1
return(d)
That one's fine (but remember the f.close() above it). However, it's more idiomatic to write return d.
One more comment: You should be consistent with your indentation: always indent 4 spaces, not a random mix of 1, 4, and 7 spaces. It makes your code a lot easier to read—especially in Python, where indenting something wrong can change the meaning of the code, and that can be hard to spot when each indent level isn't consistent.

using different files for a function, but I/O error because of using with

Im pretty new at this, so the answer to this problem might be rather easy. I'm just not seeing it atm.
I have a function that is used to count the number of words that appear in a file. Within the function i use with file as fin, as shown below. If i try to re-use the function it gives the error that the file is closed. I solved it, but it looks bad (IMO):
def lcount(keyword):
with file_to_count as fin:
return sum([1 for line in fin if keyword in line])
file_to_count = open('peer.txt', 'r')
y = lcount('bars')
print y
file_to_count = open('peer2.txt, 'r')
w = lcount('bar')
file_to_count = open('Peer2.txt', 'r')
e = lcount('table')
print w, e
If I do not restate
file_to_count = open('Peer2.txt', 'r')
for the second time(after i count 'bar'. it will give the I/O error while 'table' is going through the function.
So the code works, but I want to use lcount for other words, do I need to restate the file_to_count everytime or are there solutions/alternatives?
Thanks for you attention
The problem is is that you're using the with statement which is a context manager. Which basically makes sure that the file is closed at the end of the block. So, your function closes the file, and then you need to open it again.
But also, using a global object isn't a great idea, as you can see, it can(will) introduce a lot of subtle bugs. Try to make function don't depend things other than their parameters.
Like;
def lcount(fname, keyword):
with open(fname) as fin:
# Use a generator expr. to avoid intermediate list
return sum(1 for line in fin if keyword in line)
# Or better, since True == 1
return sum(keyword in line for line in fin)
#file is closed now
fname = "peer2.txt"
words = "bars bar table".split()
# don't repeat yourself
for word in words:
print lcount(fname, word)
You can refactor your function with an argument for the file name:
def lcount(filename, keyword):
with open(filename, 'r') as fin:
return sum([1 for line in fin if keyword in line])
and use it like this:
y = lcount('peer.txt','bar')
w = lcount('peer2.txt','bar')
Thats a strange way to handle file IO through context manager, its more intuitive and accepted to use.
def lcount(fname, keyword):
with open(fname) as fin:
return sum(1 for line in fin if keyword in line)
Also sum accepts a generator, so it may not be required to generate a list before calling sum.
Note
If you are curious, why your approch fails, remember, context manager calls the __exit__ method of the object generated after evaluating the expression, after the end of the block, which in case of file object, closes the file. So, its important to limit the scope of the expression creating the object within the with statement, so as not to reopen the file everytime you invoke the with block.
You should restructure things entirely. The with statement is meant to provide a scope for an object, so that the object is cleanly closed at the end. I'm not sure file_to_count is available as a global within the with statement, but that's a bad practice anyway - better to pass the file into the function.
def lcount(keyword, fin):
return sum([1 for line in fin if keyword in line])
with open('peer.txt', 'r') as file_to_count:
y = lcount('bars', file_to_count)
print y
with open('peer2.txt', 'r') as file_to_count:
w = lcount('bar', file_to_count)
file_to_count.seek(0) # start over at the beginning of the file
e = lcount('table', file_to_count)

Loop within a loop not re-looping with reading a file Python3

Trying to write a code that will find all of a certain type of character in a text file
For vowels it'll find all of the number of a's but won't reloop through text to read e's. help?
def finder_character(file_name,character):
in_file = open(file_name, "r")
if character=='vowel':
brain_rat='aeiou'
elif character=='consonant':
brain_rat='bcdfghjklmnpqrstvwxyz'
elif character=='space':
brain_rat=''
else:
brain_rat='!##$%^&*()_+=-123456789{}|":?><,./;[]\''
found=0
for line in in_file:
for i in range (len(brain_rat)):
found += finder(file_name,brain_rat[i+1,i+2])
in_file.close()
return found
def finder(file_name,character):
in_file = open(file_name, "r")
line_number = 1
found=0
for line in in_file:
line=line.lower()
found +=line.count(character)
return found
If you want to use your original code, you have to pass the filename to the finder() function, and open the file there, for each char you are testing for.
The reason for this is that the file object (in_file) is a generator, not a list. The way a generator works, is that it returns the next item each time you call their next() method. When you say
for line in in_file:
The for ... in statement calls in_file.next() as long as the next() method "returns" (it actually use the keyword yield, but don't think about that for now) a value. When the generator doesn't return any values any longer, we say that the generator is exhausted. You can't re-use an exhausted generator. If you want to start over again, you have to make a new generator.
I allowed myself to rewrite your code. This should give you the desired result. If anything is unclear, please ask!
def finder_character(file_name,character):
with open(file_name, "r") as ifile:
if character=='vowel':
brain_rat='aeiou'
elif character=='consonant':
brain_rat='bcdfghjklmnpqrstvwxyz'
elif character=='space':
brain_rat=' '
else:
brain_rat='!##$%^&*()_+=-123456789{}|":?><,./;[]\''
return sum(1 if c.lower() in brain_rat else 0 for c in ifile.read())
test.txt:
eeehhh
iii!#
kk ="k
oo o
Output:
>>>print(finder_character('test.txt', 'vowel'))
9
>>>print(finder_character('test.txt', 'consonant'))
6
>>>print(finder_character('test.txt', 'space'))
2
>>>print(finder_character('test.txt', ''))
4
If you are having problems understanding the return line, it should be read backwards, like this:
Sum this generator:
Make a generator with values as v in:
for row in ifile.read():
if c.lower() in brain_rat:
v = 1
else:
v = 0
If you want to know more about generators, I recommend the Python Wiki page concerning it.
This seems to be what you are trying to do in finder_character. I'm not sure why you need finder at all.
In python you can loop over iterables (like strings), so you don't need to do range(len(string)).
for line in in_file:
for i in brain_rat:
if i in line: found += 1
There appear to be a few other oddities in your code too:
You open (and iterate through) the file twice, but only closed once.
line_number is never used
You get the total of a character in a file for each line in the file, so the total will be vastly inflated.
This is probably a much safer version, with open... is generally better than open()... file.close() as you don't need to worry as much about error handling and closing. I've added some comments to help explain what you are trying to do.
def finder_character(file_name,character):
found=0 # Initialise the counter
with open(file_name, "r") as in_file:
# Open the file
in_file = file_name.split('\n')
opts = { 'vowel':'aeiou',
'consonant':'bcdfghjklmnpqrstvwxyz',
'space':'' }
default= '!##$%^&*()_+=-123456789{}|":?><,./;[]\''
for line in in_file:
# Iterate through each line in the file
for c in opts.get(character,default):
With each line, also iterate through the set of chars to check.
if c in line.lower():
# If the current character is in the line
found += 1 # iterate the counter.
return found # return the counter

Categories