I am having the following issues in my code below,please provide inputs on where it is going wrong?
change_ignore_base.txt and change_ignore_file.txt are not getting created,where is it going wrong?
I see chagne_ignore has "\r" and "\n" appended,what is the smart way to strip off them and put them in a variable which can later be used to search.
change_ids.txt
206061
150362
147117
147441
143446
200912
change_ignore.txt
150362
147117
147441
143446
200914
Code
import os
import subprocess
from subprocess import check_call
def sync (base_change):
# open a file
with open('change_ignore.txt') as f:
change_ignore = f.readlines()
print "change_ignore"
print change_ignore
with open('change_ids.txt') as f:
lines = f.readlines()
for line in lines:
line=line.strip()
print line
if line <= base_change:
print "IN line<=base_change"
print line
with open("change_ignore_base.txt", "a") as myfile:
myfile.write(line)
if line in change_ignore:
print "IN change_ignore"
print line
with open("change_ignore_file.txt", "a") as myfile:
myfile.write("line")
if line > base_change and line not in change_ignore:
pass
def main ():
base_change=200913
sync(base_change)
if __name__ == '__main__':
main()
Here is a mild adjustment to your program that I believe accomplishes what you want. Key points (as pointed out in the comments) are that you want to compare integers with integers, and that you should avoid opening/closing files multiple times (as was happening with the file appends inside the loop).
import os
import subprocess
from subprocess import check_call
def sync(base_change):
# Generate a list of integers based on your change_ignore file
with open('change_ignore.txt', 'rb') as f:
# Here we make a list of integers based on the file
change_ignore = [int(line.strip()) for line in f]
# Store your hits/misses in lists; that way you do not
# need to continuously open/close files while appending
change_ignore_base = []
change_ignore_file = []
# Now open the file of the IDs
with open('change_ids.txt', 'rb') as f:
# Iterate over the file itself
for line in f:
# Convert the line to an integer (note that this
# implicitly removes the newline characters)
# However we are going to write 'line' to our list,
# which will keep the newline (more on that later)
num = int(line)
print num
# Now we are comparing ints with ints
# I'm assuming the print statements are for debugging,
# so we offset them with some space, making it so that
# any relevant hits are indented under a number
if num <= base_change:
print " IN line<=base_change"
change_ignore_base.append(line)
if num in change_ignore:
print " IN change_ignore"
change_ignore_file.append(line)
if num > base_change and num not in change_ignore:
pass
# Now that you have lists containing the data for your new files,
# write them (they already have newlines appended so writelines works)
# You can use 'with' with two files in this way in Python 2.7+,
# but it goes over 80 characters here so I'm not a huge fan :)
with open('change_ignore_base', 'wb') as b, open('change_ignore_file', 'wb') as f:
b.writelines(change_ignore_base)
f.writelines(change_ignore_file)
def main ():
base_change=200913
sync(base_change)
main()
This should create your files and print the following:
206061
150362
IN line<=base_change
IN change_ignore
147117
IN line<=base_change
IN change_ignore
147441
IN line<=base_change
IN change_ignore
143446
IN line<=base_change
IN change_ignore
200912
IN line<=base_change
Related
I am trying to store three different variables(which are results from a for loop) as one line in a file. My entire code just in case you wondering what I am trying to do :
from Bio.PDB import *
from Bio import SeqIO
from glob import glob
parser = PDBParser(PERMISSIVE=True)
pdb_files = glob('/pdb_RF0001/*')
for fileName in pdb_files:
structure_id = fileName.rsplit('/', 1)[1][:-4]
structure = parser.get_structure(structure_id, fileName)
model = structure[0]
for residue1 in structure.get_residues():
for residue2 in structure.get_residues():
if residue1 != residue2:
try:
distance = residue1['P'] - residue2['P']
except KeyError:
continue
f = open('%s.txt' % fileName, 'w')
line = str(residue1)+','+str(residue2)+','+str(distance)
f.write(line)
f.close()
break
Sample code for check :
f = open('%s.txt' % fileName, 'wb')
line = int(residue1)+','+int(residue)+','+float(distance)
f.write(line)
f.close()
How to store the three different variables from the line variable as one line in an output file?
Use f-string
line = f"{residue1}, {residue}, {distance}"
int(residue) is an integer, and float(distance) is a real number (specifically, a floating-point number, hence the "float"). Thus, in this line, you are trying to add numbers to strings:
line = int(residue1)+','+int(residue)+','+float(distance)
However, Python disallows this. What you probably want to do is convert residue1, residue, and distance from (what I assume are) numbers to strings, like this:
line = str(residue1)+','+str(residue)+','+str(distance)
str.format() is one of the string formatting methods in Python
"{}, {}, {}".format("residue1", "residue", "distance")
How could I print the final line of a text file read in with python?
fi=open(inputFile,"r")
for line in fi:
#go to last line and print it
One option is to use file.readlines():
f1 = open(inputFile, "r")
last_line = f1.readlines()[-1]
f1.close()
If you don't need the file after, though, it is recommended to use contexts using with, so that the file is automatically closed after:
with open(inputFile, "r") as f1:
last_line = f1.readlines()[-1]
Do you need to be efficient by not reading all the lines into memory at once? Instead you can iterate over the file object.
with open(inputfile, "r") as f:
for line in f: pass
print line #this is the last line of the file
Three ways to read the last line of a file:
For a small file, read the entire file into memory
with open("file.txt") as file:
lines = file.readlines()
print(lines[-1])
For a big file, read line by line and print the last line
with open("file.txt") as file:
for line in file:
pass
print(line)
For efficient approach, go directly to the last line
import os
with open("file.txt", "rb") as file:
# Go to the end of the file before the last break-line
file.seek(-2, os.SEEK_END)
# Keep reading backward until you find the next break-line
while file.read(1) != b'\n':
file.seek(-2, os.SEEK_CUR)
print(file.readline().decode())
If you can afford to read the entire file in memory(if the filesize is considerably less than the total memory), you can use the readlines() method as mentioned in one of the other answers, but if the filesize is large, the best way to do it is:
fi=open(inputFile, 'r')
lastline = ""
for line in fi:
lastline = line
print lastline
You could use csv.reader() to read your file as a list and print the last line.
Cons: This method allocates a new variable (not an ideal memory-saver for very large files).
Pros: List lookups take O(1) time, and you can easily manipulate a list if you happen to want to modify your inputFile, as well as read the final line.
import csv
lis = list(csv.reader(open(inputFile)))
print lis[-1] # prints final line as a list of strings
If you care about memory this should help you.
last_line = ''
with open(inputfile, "r") as f:
f.seek(-2, os.SEEK_END) # -2 because last character is likely \n
cur_char = f.read(1)
while cur_char != '\n':
last_line = cur_char + last_line
f.seek(-2, os.SEEK_CUR)
cur_char = f.read(1)
print last_line
This might help you.
class FileRead(object):
def __init__(self, file_to_read=None,file_open_mode=None,stream_size=100):
super(FileRead, self).__init__()
self.file_to_read = file_to_read
self.file_to_write='test.txt'
self.file_mode=file_open_mode
self.stream_size=stream_size
def file_read(self):
try:
with open(self.file_to_read,self.file_mode) as file_context:
contents=file_context.read(self.stream_size)
while len(contents)>0:
yield contents
contents=file_context.read(self.stream_size)
except Exception as e:
if type(e).__name__=='IOError':
output="You have a file input/output error {}".format(e.args[1])
raise Exception (output)
else:
output="You have a file error {} {} ".format(file_context.name,e.args)
raise Exception (output)
b=FileRead("read.txt",'r')
contents=b.file_read()
lastline = ""
for content in contents:
# print '-------'
lastline = content
print lastline
I use the pandas module for its convenience (often to extract the last value).
Here is the example for the last row:
import pandas as pd
df = pd.read_csv('inputFile.csv')
last_value = df.iloc[-1]
The return is a pandas Series of the last row.
The advantage of this is that you also get the entire contents as a pandas DataFrame.
Im trying to delete a specific line (10884121) in a text file that is about 30 million lines long. This is the method I first attempted, however, when I execute it runs for about 20 seconds then gives me a "memory error". Is there a better way to do this? Thanks!
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
with open(f_in, 'r') as fin:
with open(f_out, 'w') as fout:
linenums = [10884121]
s = [y for x, y in enumerate(fin) if x not in [line - 1 for line in linenums]]
fin.seek(0)
fin.write(''.join(s))
fin.truncate(fin.tell())
First of all, you were not using the imports; you were trying to write to the input file, and your code read the whole file into memory.
Something like this might do the trick with less hassle - we read line by line,
use enumerate to count the line numbers; and for each line we write it to output if its number is not in the list of ignored lines:
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
for lineno, line in enumerate(fin, 1):
if lineno not in ignored_lines:
fout.write(line)
Please try to use:
import fileinput
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
f = open(f_out,'w')
counter=0
for line in fileinput.input([f_in]):
counter=counter+1
if counter != 10884121:
f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check
f.close() # you can omit in most cases as the destructor will call it
There are high chances that you run out of memory since you are trying to store file into list.
Try this below:
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
if lineNumber not in linenums:
_fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()
Here we are reading file line by line and excluding lines which are not needed, this may not run out of memory.
You can also try reading file using buffering.
Hope this helps.
How about a generic file filter function?
def file_filter(file_path, condition=None):
"""Yield lines from a file if condition(n, line) is true.
The condition parameter is a callback that receives two
parameters: the line number (first line is 1) and the
line content."""
if condition is None:
condition = lambda n, line: True
with open(file_path) as source:
for n, line in enumerate(source):
if condition(n + 1, line):
yield line
open(f_out, 'w') as destination:
condition = lambda n, line: n != 10884121
for line in file_filter(f_in, condition):
destination.write(line)
I have the very simple task of creating a text file with 8 random integers from 1-100, reading the file, displaying the numbers on the same line, calculating the even integers and the odd integers, and then displaying them.
The problem I am having is getting the string to display on the same line. I have browsed multiple articles about similar problems to no avail. I have attempted to use .join, however, it seems to break the code when I include it.
# Imports random and time
import random
import time
# Defines the main function
def main():
# Opens file "mynumbers" and creates it if not existent
myfile = open('mynumbers.txt', 'w')
# Statement to write intergers to text file in the correct format
for count in range(8):
number = random.randint(1, 100)
myfile.write(str(number) + '\n')
# Defines read function
def read():
# Opens the "mynumbers" file created in the main function
myfile= open('mynumbers.txt', 'r')
# Sets the content variable to the content of the file that was opened
content = myfile.read()
# Prints the content variable and strips the \n from the string
stripit = content.rstrip('\n')
print(stripit)
# Calls for the functions, prints created, and sleep calls
main()
print('File Created!')
time.sleep(1)
read()
time.sleep(5)
Any help that can be provided would be greatly appreciated.
Your read function is reading the whole file contents into a single string. Your rstrip call on that string removes the last newline from it, but not any of the internal newlines. You can't effectively use str.join, since you only have the one string.
I think there are two reasonable solutions. The first is to stay with just a single string, but replace all the internal newlines with spaces:
def read():
myfile = open('mynumbers.txt', 'r')
content = myfile.read()
stripit = content.rstrip('\n')
nonewlines = stripit.replace('\n', ' ')
print(nonewlines)
The other approach is to split the single string up into a list of separate strings, one for each number. This is more useful if we need to do different things with them later. Of course, all we're going to do is use join to combine them back together:
def read():
myfile = open('mynumbers.txt', 'r')
content = myfile.read()
content_list = content.split() # by default, splits on any kind of whitespace
rejoined_content = " ".join(content_list)
print(rejoined_content)
Don't add a newline char when you write the file. Just use a space instead (or comma, whatever)
import random
import time
#Defines the main function
def main():
#Opens file "mynumbers" and creates it if not existent
myfile = open('mynumbers.txt', 'w')
#Statement to write intergers to text file in the correct format
for count in range(8):
number = random.randint(1,100)
myfile.write(str(number) +' ')
#Defines read function
def read():
#Opens the "mynumbers" file created in the main function
myfile= open('mynumbers.txt', 'r')
#Sets the content variable to the content of the file that was opened
content=myfile.read()
#Prints the content variable and strips the \n from the string
print(content)
#Calls for the functions, prints created, and sleep calls
main()
print('File Created!')
time.sleep(1)
read()
time.sleep(5)
the code looks great but do this instead on your read() function.
def read():
my_numbers = []
with open('mynumbers.txt', 'r') as infile:
for line in infile:
line = line.strip()
my_numbers.append(line)
print (' '.join(line))
I would do it like this, especially because you mentioned the even and odd part that you'll need to do next. At the end of the first loop, you'll have a list of ints (rather than strs) that you can work with and determine whether they are even or odd.
def read():
my_nums = []
with open('mynumbers.txt', 'r') as f:
for line in f:
num_on_line = int(line.strip())
my_nums += [num_on_line]
print num_on_line, #don't forget that comma
for num in my_nums:
#display the even and odds
You could print the numbers in a single line in this way
with open('mynumbers.txt', 'r') as numbers_file:
for line in numbers_file:
print(line.strip(), end=" ")
The line.strip() is for eliminate the \n character.
I just started learning python 3 weeks ago, I apologize if this is really basic. I needed to open a .txt file and print the length of the longest line of code in the file. I just made a random file named it myfile and saved it to my desktop.
myfile= open('myfile', 'r')
line= myfile.readlines()
len(max(line))-1
#the (the "-1" is to remove the /n)
Is this code correct? I put it in interpreter and it seemed to work OK.
But I got it wrong because apparently I was supposed to use a while loop. Now I am trying to figure out how to put it in a while loop. I've read what it says on python.org, watched videos on youtube and looked through this site. I just am not getting it. The example to follow that was given is this:
import os
du=os.popen('du/urs/local')
while 1:
line= du.readline()
if not line:
break
if list(line).count('/')==3:
print line,
print max([len(line) for line in file(filename).readlines()])
Taking what you have and stripping out the parts you don't need
myfile = open('myfile', 'r')
max_len = 0
while 1:
line = myfile.readline()
if not line:
break
if len(line) # ... somethin
# something
Note that this is a crappy way to loop over a file. It relys on the file having an empty line at the end. But homework is homework...
max(['b','aaa']) is 'b'
This lexicographic order isn't what you want to maximise, you can use the key flag to choose a different function to maximise, like len.
max(['b','aaa'], key=len) is 'aaa'
So the solution could be: len ( max(['b','aaa'], key=len) is 'aaa' ).
A more elegant solution would be to use list comprehension:
max ( len(line)-1 for line in myfile.readlines() )
.
As an aside you should enclose opening a file using a with statement, this will worry about closing the file after the indentation block:
with open('myfile', 'r') as mf:
print max ( len(line)-1 for line in mf.readlines() )
As other's have mentioned, you need to find the line with the maximum length, which mean giving the max() function a key= argument to extract that from each of lines in the list you pass it.
Likewise, in a while loop you'd need to read each line and see if its length was greater that the longest one you had seen so far, which you could store in a separate variable and initialize to 0 before the loop.
BTW, you would not want to open the file with os.popen() as shown in your second example.
I think it will be easier to understand if we keep it simple:
max_len = -1 # Nothing was read so far
with open("filename.txt", "r") as f: # Opens the file and magically closes at the end
for line in f:
max_len = max(max_len, len(line))
print max_len
As this is homework... I would ask myself if I should count the line feed character or not. If you need to chop the last char, change len(line) by len(line[:-1]).
If you have to use while, try this:
max_len = -1 # Nothing was read
with open("t.txt", "r") as f: # Opens the file
while True:
line = f.readline()
if(len(line)==0):
break
max_len = max(max_len, len(line[:-1]))
print max_len
For those still in need. This is a little function which does what you need:
def get_longest_line(filename):
length_lines_list = []
open_file_name = open(filename, "r")
all_text = open_file_name.readlines()
for line in all_text:
length_lines_list.append(len(line))
max_length_line = max(length_lines_list)
for line in all_text:
if len(line) == max_length_line:
return line.strip()
open_file_name.close()