Trying to write a code that will find all of a certain type of character in a text file
For vowels it'll find all of the number of a's but won't reloop through text to read e's. help?
def finder_character(file_name,character):
in_file = open(file_name, "r")
if character=='vowel':
brain_rat='aeiou'
elif character=='consonant':
brain_rat='bcdfghjklmnpqrstvwxyz'
elif character=='space':
brain_rat=''
else:
brain_rat='!##$%^&*()_+=-123456789{}|":?><,./;[]\''
found=0
for line in in_file:
for i in range (len(brain_rat)):
found += finder(file_name,brain_rat[i+1,i+2])
in_file.close()
return found
def finder(file_name,character):
in_file = open(file_name, "r")
line_number = 1
found=0
for line in in_file:
line=line.lower()
found +=line.count(character)
return found
If you want to use your original code, you have to pass the filename to the finder() function, and open the file there, for each char you are testing for.
The reason for this is that the file object (in_file) is a generator, not a list. The way a generator works, is that it returns the next item each time you call their next() method. When you say
for line in in_file:
The for ... in statement calls in_file.next() as long as the next() method "returns" (it actually use the keyword yield, but don't think about that for now) a value. When the generator doesn't return any values any longer, we say that the generator is exhausted. You can't re-use an exhausted generator. If you want to start over again, you have to make a new generator.
I allowed myself to rewrite your code. This should give you the desired result. If anything is unclear, please ask!
def finder_character(file_name,character):
with open(file_name, "r") as ifile:
if character=='vowel':
brain_rat='aeiou'
elif character=='consonant':
brain_rat='bcdfghjklmnpqrstvwxyz'
elif character=='space':
brain_rat=' '
else:
brain_rat='!##$%^&*()_+=-123456789{}|":?><,./;[]\''
return sum(1 if c.lower() in brain_rat else 0 for c in ifile.read())
test.txt:
eeehhh
iii!#
kk ="k
oo o
Output:
>>>print(finder_character('test.txt', 'vowel'))
9
>>>print(finder_character('test.txt', 'consonant'))
6
>>>print(finder_character('test.txt', 'space'))
2
>>>print(finder_character('test.txt', ''))
4
If you are having problems understanding the return line, it should be read backwards, like this:
Sum this generator:
Make a generator with values as v in:
for row in ifile.read():
if c.lower() in brain_rat:
v = 1
else:
v = 0
If you want to know more about generators, I recommend the Python Wiki page concerning it.
This seems to be what you are trying to do in finder_character. I'm not sure why you need finder at all.
In python you can loop over iterables (like strings), so you don't need to do range(len(string)).
for line in in_file:
for i in brain_rat:
if i in line: found += 1
There appear to be a few other oddities in your code too:
You open (and iterate through) the file twice, but only closed once.
line_number is never used
You get the total of a character in a file for each line in the file, so the total will be vastly inflated.
This is probably a much safer version, with open... is generally better than open()... file.close() as you don't need to worry as much about error handling and closing. I've added some comments to help explain what you are trying to do.
def finder_character(file_name,character):
found=0 # Initialise the counter
with open(file_name, "r") as in_file:
# Open the file
in_file = file_name.split('\n')
opts = { 'vowel':'aeiou',
'consonant':'bcdfghjklmnpqrstvwxyz',
'space':'' }
default= '!##$%^&*()_+=-123456789{}|":?><,./;[]\''
for line in in_file:
# Iterate through each line in the file
for c in opts.get(character,default):
With each line, also iterate through the set of chars to check.
if c in line.lower():
# If the current character is in the line
found += 1 # iterate the counter.
return found # return the counter
Related
I have a medium-size file (25MB, 1000000 rows), and I want to read every row except every third row.
FIRST QUESTION: Is it faster to load the whole file into memory and then read the rows (method .read()), or load and read one row at the time (method .readline())?
Since I'm not an experienced coder I tried the second option with islice method from itertools module.
import intertools
with open(input_file) as inp:
inp_atomtype = itertools.islice(inp, 0, 40, 3)
inp_atomdata = itertools.islice(inp, 1, 40, 3)
for atomtype, atomdata in itertools.zip_longest(inp_atomtype, inp_atomdata):
print(atomtype + atomdata)
Although looping through single generator (inp_atomtype or inp_atomdata) prints correct data, looping through both of them simultaneously (as in this code) prints wrong data.
SECOND QUESTION: How can I reach desired rows using generators?
You don't need to slice the iterator, a simple line counter should be enough:
with open(input_file) as f:
current_line = 0
for line in f:
current_line += 1
if current_line % 3: # ignore every third line
print(line) # NOTE: print() will add an additional new line by default
As for turning it into a generator, just yield the line instead of printing.
When it comes to speed, given that you'll be reading your lines anyway the I/O part will probably take the same but you might benefit a bit (in total processing time) from fast list slicing instead of counting lines if you have enough working memory to keep the file contents and if loading the whole file upfront instead of streaming is acceptable.
yield is perfect for this.
This functions yields pairs from an iterable and skip every third item:
def two_thirds(seq):
_iter = iter(seq)
while True:
yield (next(_iter), next(_iter))
next(_iter)
You will lose half pairs, which means that two_thirds(range(2)) will stop iterating immediately.
https://repl.it/repls/DullNecessaryCron
You can also use the grouper recipe from itertools doc and ignore the third item in each tuple generated:
for atomtype, atomdata, _ in grouper(lines, 3):
pass
FIRST QUESTION: I am pretty sure that .readline() is faster than .read(). Plus, the fastest way based my test is to do lopping like:
with open(file, 'r') as f:
for line in f:
...
SECOND QUESTION: I am not quite sure abut this. you may consider to use yield.
There is a code snippet you may refer:
def myreadlines(f, newline):
buf = ""
while True:
while newline in buf:
pos = buf.index(newline)
yield buf[:pos]
buf = buf[pos + len(newline):]
chunk = f.read(4096)
if not chunk:
# the end of file
yield buf
break
buf += chunk
with open("input.txt") as f:
for line in myreadlines(f, "{|}"):
print (line)
q2: here's my generator:
def yield_from_file(input_file):
with open(input_file) as file:
yield from file
def read_two_skip_one(gen):
while True:
try:
val1 = next(gen)
val2 = next(gen)
yield val1, val2
_ = next(gen)
except StopIteration:
break
if __name__ == '__main__':
for atomtype, atomdata in read_two_skip_one(yield_from_file('sample.txt')):
print(atomtype + atomdata)
sample.txt was generated with a bash shell (it's just lines counting to 100)
for i in {001..100}; do echo $i; done > sample.txt
regarding q1: if you're reading the file multiple times, you'd be better off to have it in memory. otherwise you're fine reading it line by line.
Regarding the problem you're having with the wrong results:
both itertools.islice(inp, 0, 40, 3) statements will use inp as generator. Both will call next(inp), to provide you with a value.
Each time you call next() on an iterator, it will change its state, so that's where your problems come from.
You can use a generator expression:
with open(input_file, 'r') as f:
generator = (line for e, line in enumerate(f, start=1) if e % 3)
enumerate adds line numbers to each line, and the if clause ignores line numbers divisible by 3 (default numbering starts at 0, so you have to specify start=1 to get the desired pattern).
Keep in mind that you can only use the generator while the file is still open.
I have a function that writes the content of list into a text file. For every element in the list, it writes the element into the text file, each having it's own new line.
def write_file(filename):
name_file = filename
filename = open(name_file, 'w')
for line in list:
if line == len(list)-1:
filename.write(line)
else:
filename.write(line+'\n')
filename.close()
i tend to notice a mistake where an empty newline is generated at the final line of a text file and I'm wondering if I am writing the file correctly?
Let's say my list contains [1,2,3,4] and writing it to the text file would give me
1
2
3
4
#in some cases, an empty newline is printed here at the end
I have no idea how to check if the write function is generating an extra line in the end due to the '\n' so I'll appreciate if anyone could give me some feedback.
Instead of writing to the buffer so many times, do a .join, and write the result once:
with open(filename, 'w') as fp:
fp.write('\n'.join(your_list))
Update:
#John Coleman has pointed out a misunderstanding. It seems that the last line should not have any new line character. This can be corrected by using enumerate() to provide a line count, checking whether it's the last line when printing, and varying the line end character accordingly:
def write_file(filename, data):
with open(filename, 'w') as f:
for line_no, item in enumerate(data, 1):
print(item, file=f, end='\n' if line_no < len(data) else '')
This is not as elegant as using \n.join(data)` but it is memory efficient for large lists.
Alternative to join() is:
def write_file(filename, data):
with open(filename, 'w') as f:
print(*data, file=f, sep='\n', end='')
Original answer:
Why not simply use print() and specify the output file?
def write_file(filename, data):
with open(filename, 'w') as f:
for item in data:
print(item, file=f)
Or more succinctly:
def write_file(filename, data):
with open(filename, 'w') as f:
print(*data, file=f, sep='\n')
The former is preferred if you have a large list because the latter needs to unpack the list to pass its contents as arguments to print().
Both options will automatically take care of the new line characters for you.
Opening the file in a with statement will also take care of closing the file for you.
You could also use '\n'.join() to join the items in the list. Again, this is feasible for smallish lists. Also, your example shows a list of integers - print() does not require that its arguments first be converted to strings, as does join().
Try
def write_file(filename):
name_file = filename
filename = open(name_file, 'w')
for line in list:
if line == list[-1]:
filename.write(line)
else:
filename.write(line+'\n')
filename.close()
In your example line == len(list)-1: you are just you are comparing an int the length of the list -1 instead of the last item in the list.
Although this is still not perfect as you could run into issues if you have repeating items in the list such as [1,2,3,5,2] in this case it would be best to use a join or a for i statement.
If you want to write to a file from list of strings, you can use the following snippet:
def write_file(filename):
with open(filename, 'w') as f:
f.write('\n'.join(lines))
lines = ["hi", "hello"]
write_file('test.txt')
You shouldn't use for line in list here, list shouldn't be used for a list name because the word "list" is a reserved word for python. It's a keyword. You can do myLst = list("abcd") to obtain something like myLst=["a", "b", "c", "d"]
And about the solution to your problem, I recommend you use the with method in case you forget to close your file. That way, you won't have to close your file. Just exiting the indent will do the work. Here is how I have solved your problem:
#I just made a list using list comprehension method to avoid writing so much manually.
myLst=list("List number {}".format(x) for x in range(15))
#Here is where you open the file
with open ('testfile.txt','w') as file:
for each in myLst:
file.write(str(each))
if each!=myLst[len(myLst)-1]:
file.write('\n')
else:
#this "continue" command tells the python script to continue on to the next loop.
#It basically skips the current loop.
continue
I hope I was helpful.
thefile = open('test.txt', 'w')
I'd use a loop:
for item in thelist:
thefile.write("%s\n" % item)
Im pretty new at this, so the answer to this problem might be rather easy. I'm just not seeing it atm.
I have a function that is used to count the number of words that appear in a file. Within the function i use with file as fin, as shown below. If i try to re-use the function it gives the error that the file is closed. I solved it, but it looks bad (IMO):
def lcount(keyword):
with file_to_count as fin:
return sum([1 for line in fin if keyword in line])
file_to_count = open('peer.txt', 'r')
y = lcount('bars')
print y
file_to_count = open('peer2.txt, 'r')
w = lcount('bar')
file_to_count = open('Peer2.txt', 'r')
e = lcount('table')
print w, e
If I do not restate
file_to_count = open('Peer2.txt', 'r')
for the second time(after i count 'bar'. it will give the I/O error while 'table' is going through the function.
So the code works, but I want to use lcount for other words, do I need to restate the file_to_count everytime or are there solutions/alternatives?
Thanks for you attention
The problem is is that you're using the with statement which is a context manager. Which basically makes sure that the file is closed at the end of the block. So, your function closes the file, and then you need to open it again.
But also, using a global object isn't a great idea, as you can see, it can(will) introduce a lot of subtle bugs. Try to make function don't depend things other than their parameters.
Like;
def lcount(fname, keyword):
with open(fname) as fin:
# Use a generator expr. to avoid intermediate list
return sum(1 for line in fin if keyword in line)
# Or better, since True == 1
return sum(keyword in line for line in fin)
#file is closed now
fname = "peer2.txt"
words = "bars bar table".split()
# don't repeat yourself
for word in words:
print lcount(fname, word)
You can refactor your function with an argument for the file name:
def lcount(filename, keyword):
with open(filename, 'r') as fin:
return sum([1 for line in fin if keyword in line])
and use it like this:
y = lcount('peer.txt','bar')
w = lcount('peer2.txt','bar')
Thats a strange way to handle file IO through context manager, its more intuitive and accepted to use.
def lcount(fname, keyword):
with open(fname) as fin:
return sum(1 for line in fin if keyword in line)
Also sum accepts a generator, so it may not be required to generate a list before calling sum.
Note
If you are curious, why your approch fails, remember, context manager calls the __exit__ method of the object generated after evaluating the expression, after the end of the block, which in case of file object, closes the file. So, its important to limit the scope of the expression creating the object within the with statement, so as not to reopen the file everytime you invoke the with block.
You should restructure things entirely. The with statement is meant to provide a scope for an object, so that the object is cleanly closed at the end. I'm not sure file_to_count is available as a global within the with statement, but that's a bad practice anyway - better to pass the file into the function.
def lcount(keyword, fin):
return sum([1 for line in fin if keyword in line])
with open('peer.txt', 'r') as file_to_count:
y = lcount('bars', file_to_count)
print y
with open('peer2.txt', 'r') as file_to_count:
w = lcount('bar', file_to_count)
file_to_count.seek(0) # start over at the beginning of the file
e = lcount('table', file_to_count)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm new here and I'm also new to python. I would like to get your help, please.
def lines(path, prefix):
funf = open(path, 'r')
dictionary = {}
for lines in funf:
word = lines.split()
a_word = (word[0])
dictionary[a_word] = dictionary.get(a_word, 0) + 1
if prefix != word[0]:
return 0
else:
return dictionary[prefix]
funf.close()
when I run this:
inpath = "filetext1.txt"
print(lines(inpath,"But"))
and I get this:
Traceback (most recent call last):
File "C:...\...\....py", line 29, in <module>
print(lines(inpath,"This"))
File "C:...\...\....py", line 11,
in lines
if prefix != word[0]:
UnboundLocalError: local variable 'word' referenced before assignment
What is the problem, how can I change it so it would be better?
I'm asking for ideas and options (but please, without changing more things in the code... It has to be something like this structure!!!
Thanks!
In your code, the if prefix != words[0] part is happening outside the loop, after the loop has finished running. So, for a non-empty file, words will be the split of the last line of the file. And for an empty file, words will never have been set, causing exactly the error you posted.
As a side note, that for lines in f: is looping over some global object f, not the file you just opened, which is called funf. So, I suspect that f is some kind of empty iterable, and you're seeing this function even when the file you wanted to look at is not empty. If you want to loop over funf, you have to tell Python funf, not f.
And you already know this isn't correct, as in this comment:
word is the split of line. I can't do it outside the for loop
If you want to run it inside the loop, you will need to indent it to match the code inside the loop. In Python, block structure is based on indentation level:
def lines(path, prefix):
funf = open(path, 'r')
dictionary = {}
for lines in f:
word = lines.split()
a_word = (word[0])
dictionary[a_word] = dictionary.get(a_word, 0) + 1
if prefix != word[0]:
return 0
else:
return dictionary[prefix]
funf.close()
That means you'll no longer get an error; words will always be defined when you use it.
There are other problems with this code: you're returning after each line, meaning you'll never get to the second line; you're returning before you close the file, meaning the file never gets closed; it's very misleading to use plural variables names for individual things and singular variable names for lists of things; it's confusing to use a local variable with the same name as the function; etc. But one thing at a time…
After half an hour of pulling teeth, you finally explained what you're trying to do:
I'm trying to count the number of lines whose first word matches the prefix
There is no way to do that with this structure. Whether you do the if inside the loop or out, it doesn't make any sense.
The simplest way to fix it is to remove the if entirely. You're building up a dictionary of counts of each first word, right? So, just look up the value for the given prefix at the end:
def lines(path, prefix):
funf = open(path, 'r')
dictionary = {}
for lines in funf:
word = lines.split()
a_word = (word[0])
dictionary[a_word] = dictionary.get(a_word, 0) + 1
funf.close()
return dictionary.get(prefix, 0)
This will work, but it's incredibly wasteful to build up this whole dictionary just to get a single value out of it, and makes your code much more complicated as well… the whole thing could be written as:
def lines(path, prefix):
with open(path) as f:
return sum(1 for line in f if line.startswith(prefix))
Here's my filetext1.txt:
This is a test.
But this isn't.
But this is.
And this isn't.
The output should obviously be 2, right?
And both versions of my code—the "simplest fix" and the two-liner—both print ut this:
2
This works in both Python 3.3 and 2.7. If it's not working for you, either you failed at copying and pasting the code, or your input file doesn't have any lines starting with "But ".
If you're trying to count the number of lines whose first word match the prefix, why not do something simple like
def lines(path, prefix):
N_matches = 0
f = open(path, 'r')
for line in f:
words = line.split()
first_word = words[0]
if first_word == prefix:
N_matches += 1
f.close()
return N_matches
This could also be accomplished via less code:
def lines(path, prefix):
with open(path, 'r') as f:
return sum([1 for line in f if line.split()[0] == prefix])
As #abarnert points out, an even better way is
return sum(1 for line in f if line.startswith(prefix))
I just started learning python 3 weeks ago, I apologize if this is really basic. I needed to open a .txt file and print the length of the longest line of code in the file. I just made a random file named it myfile and saved it to my desktop.
myfile= open('myfile', 'r')
line= myfile.readlines()
len(max(line))-1
#the (the "-1" is to remove the /n)
Is this code correct? I put it in interpreter and it seemed to work OK.
But I got it wrong because apparently I was supposed to use a while loop. Now I am trying to figure out how to put it in a while loop. I've read what it says on python.org, watched videos on youtube and looked through this site. I just am not getting it. The example to follow that was given is this:
import os
du=os.popen('du/urs/local')
while 1:
line= du.readline()
if not line:
break
if list(line).count('/')==3:
print line,
print max([len(line) for line in file(filename).readlines()])
Taking what you have and stripping out the parts you don't need
myfile = open('myfile', 'r')
max_len = 0
while 1:
line = myfile.readline()
if not line:
break
if len(line) # ... somethin
# something
Note that this is a crappy way to loop over a file. It relys on the file having an empty line at the end. But homework is homework...
max(['b','aaa']) is 'b'
This lexicographic order isn't what you want to maximise, you can use the key flag to choose a different function to maximise, like len.
max(['b','aaa'], key=len) is 'aaa'
So the solution could be: len ( max(['b','aaa'], key=len) is 'aaa' ).
A more elegant solution would be to use list comprehension:
max ( len(line)-1 for line in myfile.readlines() )
.
As an aside you should enclose opening a file using a with statement, this will worry about closing the file after the indentation block:
with open('myfile', 'r') as mf:
print max ( len(line)-1 for line in mf.readlines() )
As other's have mentioned, you need to find the line with the maximum length, which mean giving the max() function a key= argument to extract that from each of lines in the list you pass it.
Likewise, in a while loop you'd need to read each line and see if its length was greater that the longest one you had seen so far, which you could store in a separate variable and initialize to 0 before the loop.
BTW, you would not want to open the file with os.popen() as shown in your second example.
I think it will be easier to understand if we keep it simple:
max_len = -1 # Nothing was read so far
with open("filename.txt", "r") as f: # Opens the file and magically closes at the end
for line in f:
max_len = max(max_len, len(line))
print max_len
As this is homework... I would ask myself if I should count the line feed character or not. If you need to chop the last char, change len(line) by len(line[:-1]).
If you have to use while, try this:
max_len = -1 # Nothing was read
with open("t.txt", "r") as f: # Opens the file
while True:
line = f.readline()
if(len(line)==0):
break
max_len = max(max_len, len(line[:-1]))
print max_len
For those still in need. This is a little function which does what you need:
def get_longest_line(filename):
length_lines_list = []
open_file_name = open(filename, "r")
all_text = open_file_name.readlines()
for line in all_text:
length_lines_list.append(len(line))
max_length_line = max(length_lines_list)
for line in all_text:
if len(line) == max_length_line:
return line.strip()
open_file_name.close()