How to process filenames separated by new line? - python

I'm using 2 files
1° Is findjava.py, this file outputs all java file names in a directory, separated by \n
2° countfile receives 1 single filename and counts its lines
I'm already receiving an string with the filenames in count file ( javafile1\njavafile2\njavafile3\n)
How could I run a loop to go trough all those file names one by one?
I'd need to read that string till it finds a \n, then use that part as a variable to run my script to count the lines, and then keep reading the next file name.

Split on \n
So something like
files = "javafile1\njavafile2\njavafile3\n"
list_of_files = files.split("\n")
for file in list_of_files:
with open(file) as fh:
lines = fh.readlines()

You don't leave any example data, unless it's somehow complicated by other factors it seems you can split() the data without further ado:
files = "javafile1\njavafile2\njavafile3\n"
for name in files.split():
print "counting lines in file " + name
countfile(name)

In python you can use a for loop to access a file one line at a time:
with open("file_list") as stream:
for filename in stream:
with open(filename) as f:
for line in f:
lines += 1
print f, lines

Related

to change a text file containing multiline strings

I have a text file consisting of multiline (hundreds of lines actually) strings. Each of the strings starts with '&' sign. I want to change my text file in a way that only the first 300 characters of each string remain in the new file. How I can do this by using python?
You can read a file and loop over the lines to do what you want. Strings are easily slicable in python to get the first 300 to write to another file.
file = open(path,"r")
lines = file.readlines()
newFile = open(newPath,"w")
for index, line in enumerate(lines):
newLine = line[0:301]
newFile.writelines([newLine])
Hope this is what you meant
You could do something like this:
# Open output file in append mode
with open('output.txt', 'a') as out_file:
# Open input file in read mode
with open("input.txt", "r") as in_file:
for line in in_file:
# Take first 300 characters from line
# I believe this works even when line is < 300 characters
new_line = line[0:300]
# Write new line to output
# (You might need to add '\n' for new lines)
out_file.write(new_line)
print(new_line)
You can use the string method split to split your lines, then you can use slices to keep only the 300 first characters of each split.
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line[:300]))
This version preserves ends of line \n within your strings.
If you want to remove ends of line in each individual string, you can use replace:
with open("oldFile.txt", "rt") as old_file, open("newFile.txt", "wt") as new_file:
for line in old_file.read().split("&"):
new_file.write("&{}\n".format(line.replace("\n", "")[:300]))
Note that your new file will end with an empty line.
Another note is, depending on the size of your file, you may rather use a generator function version, instead of split which results in the whole file content being loaded in memory as a list of strings.

write output from print to a file in python

I have a code that reads multiple text files and print the last line.
from glob import glob
text_files = glob('C:/Input/*.txt')
for file_name in text_files:
with open(file_name, 'r+') as f:
lines = f.read().splitlines()
last_line = lines[-3]
print (last_line)
I want to redirect the print to an output txt file , so that i will check the sentence . Also the txt files has multiple lines of space . I want to delete all the empty lines and get the last line of the file to an output file. When i try to write it is writing only the last read file. Not all files last line is written .
Can someone help ?
Thanks,
Aarush
I think you have two separate questions.
Next time you use stack overflow, if you have multiple questions, please post them separately.
Question 1
How do I re-direct the output from the print function to a file?
For example, consider a hello world program:
print("hello world")
How do we create a file (named something like text_file.txt) in the current working directory, and output the print statements to that file?
ANSWER 1
Writing output from the print function to a file is simple to do:
with open ('test_file.txt', 'w') as out_file:
print("hello world", file=out_file)
Note that print function accepts a special keyword-argument named "file"
You must write file=f in order to pass f as input to the print function.
QUESTION 2
How do I get the last non-blank line from s file? I have an input file which has lots of line-feeds, carriage-returns, and space characters at the end of. We need to ignore blank lines, and retrieve the last lien of the file which contains at least one character which is not a white-space character.
Answer 2
def get_last_line(file_stream):
for line in map(str, reversed(iter(file_stream))):
# `strip()` removes all leading a trailing white-space characters
# `strip()` removes `\n`, `\r`, `\t`, space chars, etc...
line = line.strip()
if len(line) > 0:
return line
# if the file contains nothing but blank lines
# return the empty string
return ""
You can process multiple files like so:
file_names = ["input_1.txt", "input_2.txt", "input_3.txt"]
with open ('out_file.txt', 'w') as out_file:
for file_name in file_names:
with open(file_name, 'r') as read_file:
last_line = get_last_line(read_file)
print (last_line, file=out_file)
Instead of just print, do something like this:
print(last_line)
with open('output.txt', 'w') as fout:
fout.write(last_line)
Or you could also append to the file!

Using a for loop to add a new line to a table: python

I am trying to create a .bed file after searching through DNA sequences for two regular expressions. Ideally, I'd like to generate a tab-separated file which contains the sequence description, the start location of the first regex and the end location of the second regex. I know that the regex section works, it's just creating the \t separated file I am struggling with.
I was hoping that I could open/create a file and simply print a new line for each iteration of the for loop that contains this information, like so:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(f'{sequence.description}\t{h.start()}\t{h_rc.end()}')
file_object.close()
But this doesn't seem to work (creates empty file). I have also tried to use file_object.write, but again this creates an empty file too.
This is all of the code I have including searching for the regexes:
import re, sys
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord
infile = sys.argv[1]
for sequence in SeqIO.parse(infile, "fasta"):
hit = re.finditer(r"CAGTGGG..GCAA[TA]AA", str(sequence.seq))
mimp_length = 400
for h in hit:
h_start = h.start()
hit_rc = re.finditer(r"TT[TA]TTGC..CCCACTG", str(sequence.seq))
for h_rc in hit_rc:
h_rc_end = h_rc.end()
length = h_rc_end - h_start
if length > 0:
if length < mimp_length:
with open("Mimp_hits.bed", "a+") as file_object:
for line in file_object:
print(sequence.description, h.start(), h_rc.end())
file_object.close()
This is the desired output:
Focub_II5_mimp_1__contig_1.16(656599:656809) 2 208
Focub_II5_mimp_2__contig_1.47(41315:41540) 2 223
Focub_II5_mimp_3__contig_1.65(13656:13882) 2 224
Focub_II5_mimp_4__contig_1.70(61591:61809) 2 216
This is example input:
>Focub_II5_mimp_1__contig_1.16(656599:656809)
TACAGTGGGATGCAAAAAGTATTCGCAGGTGTGTAGAGAGATTTGTTGCTCGGAAGCTAGTTAGGTGTAGCTTGTCAGGTTCTCAGTACCCTATATTACACCGAGATCAGCGGGATAATCTAGTCTCGAGTACATAAGCTAAGTTAAGCTACTAACTAGCGCAGCTGACACAACTTACACACCTGCAAATACTTTTTGCATCCCACTGTA
>Focub_II5_mimp_2__contig_1.47(41315:41540)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTCTGCCGCTAGCCCATTTTAACAGCTAGAGTGTGTATATTAACCTCACACATAGCTATCTCTTATACTAATTGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTGTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_3__contig_1.65(13656:13882)
TACAGTGGGAGGCAATAAGTATGAATACCGGGCGTGTATTGTTTTTCTGCCGCTAGCCTATTTTAATAGTTAGAGTGTGCATATTAACCTCACACATAGCTATCTTATATACTAATCGGTTAGGGAAAACCTCTAACCAGGATTAGGAGTCAACATAGCTTCTTTTAGGCTAAGAGGTGTGTGTCAGTACACCAAAGGGTATTCATACTTATTGCCCCCCACTGTA
>Focub_II5_mimp_4__contig_1.70(61591:61809)
TACAGTGGGATGCAATAAGTTTGAATGCAGGCTGAAGTACCAGCTGTTGTAATCTAGCTCCTGTATACAACGCTTTAGCTTGATAAAGTAAGCGCTAAGCTGTATCAGGCAAAAGGCTATCCCGATTGGGGTATTGCTACGTAGGGAACTGGTCTTACCTTGGTTAGTCAGTGAATGTGTACTTGAGTTTGGATTCAAACTTATTGCATCCCACTGTA
Is anybody able to help?
Thank you :)
to write a line to a file you would do something like this:
with open("file.txt", "a") as f:
print("new line", file=f)
and if you want it tab separated you can also add sep="\t", this is why python 3 made print a function so you can use sep, end, file, and flush keyword arguments. :)
opening a file for appending means the file pointer starts at the end of the file which means that writing to it doesn't override any data (gets appended to the end of the file) and iterating over it (or otherwise reading from it) gives nothing like you already reached the end of the file.
So instead of iterating over the lines of the file you would just write the single line to it:
with open("Mimp_hits.bed", "a") as file_object:
print(sequence.description, h.start(), h_rc.end(), file=file_object)
you can also consider just opening the file near the beginning of the loop since opening it once and writing multiple times is more efficient than opening it multiple times, also the with block automatically closes the file so no need to do that explicitly.
You are trying to open the file in "a+" mode, and loop over lines from it (which will not find anything because the file is positioned at the end when you do that). In any case, if this is an output file only, then you would open it in "a" mode to append to it.
Probably you just want to open the file once for appending, and inside the with statement, do your main loop, using file_object.write(...) when you want to actually append strings to the file. Note that there is no need for file_object.close() when using this with construct.
with open("Mimp_hits.bed", "a") as file_object:
for sequence in SeqIO.parse(infile, "fasta"):
# ... etc per original code ...
if length < mimp_length:
file_object.write("{}\t{}\t{}\n".format(
sequence.description, h.start(), h_rc.end()))

Strange behavior (maybe just me) from the python file handling code

I just started learning file handling in python3. The code I try to create accepts user provided file name, opens the file, then print out the total number of lines and characters within the file.
The question is that I have to declare different variables (fh and fl in the following example) for the line and character count separately.
In the following working-as-expected code, if I comment out fl = open(fname) line and change for line in fl: to for line in fh, then the line count in the output becomes zero (not expected).
fname = input('Enter the file name: ')
fh = open(fname)
text = fh.read()
fl = open(fname)
count = 0
for line in fl:
count = count + 1
print("line count in", fname, ":", count)
print("word count in", fname, ":", len(text))
Does that mean in the future if I were to process string functions on the same file, I have to declare different variable and read the same file multiple times? Is there a way to achieve "read once, use many times" goal?
When you read lines from the file you opened, the offset you are at currently is also kept track of. So after you have read all the lines (as in the for-loop), the position will be at the end of the file.
You can use f.seek(0) to set the current position in the file back to the beginning. Then you can do the for line in f again and go through the same contents twice, without having to open it twice.
See file methods.
Of course you could also both store the contents and count the number of rows in a single loop as you read through the file; or use the readlines method to get a list of the rows in the file. Using the readlines method would be a good way to read the files only once and get the contents in a useful format.

Remove whitespaces in the beginning of every string in a file in python?

How to remove whitespaces in the beginning of every string in a file with python?
I have a file myfile.txt with the strings as shown below in it:
_ _ Amazon.inc
Arab emirates
_ Zynga
Anglo-Indian
Those underscores are spaces.
The code must be in a way that it must go through each and every line of a file and remove all those whitespaces, in the beginning of a line.
I've tried using lstrip but that's not working for multiple lines and readlines() too.
Using a for loop can make it better?
All you need to do is read the lines of the file one by one and remove the leading whitespace for each line. After that, you can join again the lines and you'll get back the original text without the whitespace:
with open('myfile.txt') as f:
line_lst = [line.lstrip() for line in f.readlines()]
lines = ''.join(line_lst)
print lines
Assuming that your input data is in infile.txt, and you want to write this file to output.txt, it is easiest to use a list comprehension:
inf = open("infile.txt")
stripped_lines = [l.lstrip() for l in inf.readlines()]
inf.close()
# write the new, stripped lines to a file
outf = open("output.txt", "w")
outf.write("".join(stripped_lines))
outf.close()
To read the lines from myfile.txt and write them to output.txt, use
with open("myfile.txt") as input:
with open("output.txt", "w") as output:
for line in input:
output.write(line.lstrip())
That will make sure that you close the files after you're done with them, and it'll make sure that you only keep a single line in memory at a time.
The above code works in Python 2.5 and later because of the with keyword. For Python 2.4 you can use
input = open("myfile.txt")
output = open("output.txt", "w")
for line in input:
output.write(line.lstrip())
if this is just a small script where the files will be closed automatically at the end. If this is part of a larger program, then you'll want to explicitly close the files like this:
input = open("myfile.txt")
try:
output = open("output.txt", "w")
try:
for line in input:
output.write(line.lstrip())
finally:
output.close()
finally:
input.close()
You say you already tried with lstrip and that it didn't work for multiple lines. The "trick" is to run lstrip on each individual line line I do above. You can try the code out online if you want.

Categories